Integrated noise reduction

ABSTRACT

Presented herein are techniques for generated an integrated estimate of a target sound (e.g., speech) in sound signals received by at least a local microphone array of a device. In embodiments, the integrated estimate may be generated based on sound signals received by the at least a local microphone array of a device and at least one external microphone.

BACKGROUND Field of the Invention

The present invention generally relates to integrated noise reductionfor devices having at least one local microphone array.

Related Art

Hearing loss is a type of sensory impairment that is generally of twotypes, namely conductive and/or sensorineural. Conductive hearing lossoccurs when the normal mechanical pathways of the outer and/or middleear are impeded, for example, by damage to the ossicular chain or earcanal. Sensorineural hearing loss occurs when there is damage to theinner ear, or to the nerve pathways from the inner ear to the brain.

Individuals who suffer from conductive hearing loss typically have someform of residual hearing because the hair cells in the cochlea areundamaged. As such, individuals suffering from conductive hearing losstypically receive an auditory prosthesis that generates motion of thecochlea fluid. Such auditory prostheses include, for example, acoustichearing aids, bone conduction devices, and direct acoustic stimulators.

In many people who are profoundly deaf, however, the reason for theirdeafness is sensorineural hearing loss. Those suffering from some formsof sensorineural hearing loss are unable to derive suitable benefit fromauditory prostheses that generate mechanical motion of the cochleafluid. Such individuals can benefit from implantable auditory prosthesesthat stimulate nerve cells of the recipient's auditory system in otherways (e.g., electrical, optical and the like). Cochlear implants areoften proposed when the sensorineural hearing loss is due to the absenceor destruction of the cochlea hair cells, which transduce acousticsignals into nerve impulses. An auditory brainstem stimulator is anothertype of stimulating auditory prosthesis that might also be proposed whena recipient experiences sensorineural hearing loss due to damage to theauditory nerve.

SUMMARY

In one aspect, a method is provided. The method comprises: receivingsound signals with at least a local microphone array of a device,wherein the sound signals comprise at least one target sound; generatingan a priori estimate of the at least one target sound in the receivedsound signals based on a predetermined location of a source of the atleast one target sound; generating a direct estimate of the at least onetarget sound in the received sound signals based on a real-time estimateof a location of a source of the at least one target sound; andgenerating a weighted combination of the a priori estimate and thedirect estimate, wherein the weighted combination is an integratedestimate of the target sound.

In another aspect, a device is provided. The device comprises: a localmicrophone array configured to receive sound signals, wherein the soundsignals comprise at least one target sound; and one or more processorsconfigured to: generate an a priori estimate of the at least one targetsound in the received sound signals using only an a priori relativetransfer function (RTF) vector generated from the received soundsignals, generate a direct estimate of the at least one target sound inthe received sound signals using only an a priori relative transferfunction (RTF) vector generated from the received sound signals, andgenerate a weighted combination of the a priori estimate and the directestimate, wherein the weighted combination is an integrated estimate ofthe target sound.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention are described herein in conjunctionwith the accompanying drawings, in which:

FIG. 1 is a functional block diagram illustrating the generation ofpre-whitened transformed signals;

FIG. 2 is a functional block diagram illustrating the generation of an apriori estimate of at least one target sound in sound signals receivedat a local microphone array;

FIG. 3 is a functional block diagram illustrating the generation of adirect estimate of at least one target sound in sound signals receivedat a local microphone array;

FIG. 4 is a functional block diagram illustrating the generation of anintegrated estimate of at least one target sound in sound signalsreceived at a local microphone array;

FIG. 5 is a functional block diagram illustrating the generation of an apriori estimate of at least one target sound in sound signals receivedat a local microphone array and at least one external microphone;

FIG. 6 is a functional block diagram illustrating the generation of adirect estimate of at least one target sound in sound signals receivedat a local microphone array and at least one external microphone;

FIG. 7 is a functional block diagram illustrating the generation of anintegrated estimate of at least one target sound in sound signalsreceived at a local microphone array and at least one externalmicrophone;

FIG. 8 is flowchart of a two stage process, in accordance withembodiments presented herein;

FIG. 9 is a table summarizing the various noise reduction strategies, inaccordance with embodiments presented herein;

FIG. 10A is a schematic diagram illustrating a cochlear implant, inaccordance with certain embodiments presented herein;

FIG. 10B is a block diagram of the cochlear implant of FIG. 10A;

FIG. 11 is a block diagram of a totally implantable cochlear implant, inaccordance with certain embodiments presented herein;

FIG. 12 is a block diagram of a bone conduction device that includes aspatial pre-filter, in accordance with embodiments presented herein.

FIG. 13 is a flowchart of a method, in accordance with embodimentspresented herein.

DETAILED DESCRIPTION I. Introduction

In devices having one or more microphone arrays, such as auditoryprostheses (e.g., hearing aids, cochlear implants, bone conductiondevices, etc.), multi-microphone noise reduction systems are used topreserve desired sounds (e.g., speech), while rejecting unwanted sounds(e.g., noise). In certain conventional noise reduction systems, a localmicrophone array (LMA) worn on the recipient (i.e., part of the device)is used to focus on a sound source (e.g., speaker) that is in apredefined direction, such as directly in front of recipient. While sucha noise reduction system may be robust, it is also prone to poorperformance in situations where the desired speaker is not in thepredefined direction. Examples of such situations may be found inclassroom environments or while a recipient is travelling in a motorvehicle. The integrated noise reduction techniques presented hereinimprove upon these existing noise reduction systems in several distinctways: (i) by including the ability to focus on a target sound source(e.g., speaker) that is not in the predefined direction and, in certainarrangements, (ii) by including external microphones (XMs) that operatetogether with the LMA, resulting in further noise reduction as opposedto using only the LMA.

In certain embodiments presented herein, integrated noise reductiontechniques will utilize two separate tuning parameters, one forcontrolling the sound received from the predefined direction, and theother for the sound received from an estimated direction where thetarget sound source may be located. In these embodiments, each of thesedirections can be defined using the LMA and the XMs. In order to definethe predefined direction with the LMA and the XMs, a modified version ofthe improved method of estimation of a transfer function for the XM isused, where the input signals have to undergo a specific series oftransformations.

Using one or several XMs along with the LMA can provide significantspeech intelligibility improvement, for instance in the case where XMsmay be quite close to the desired speaker, or even if it provides arelevant noise reference. Additionally, the integrated noise reductiontechniques presented herein are flexible in that they encompass a widerange of noise reduction options according to the tuning of the system.

For ease of understanding, the following description is organized intoseveral sections. In particular, section II describes a data model,which considers the general case of a local microphone array (LMA) inconjunction with one or several external microphones (XMs), which can bereduced to a single external microphone without compromising theequations provided herein. A transformed domain, as well as apre-whitened-transformed domain is also introduced in order to simplifythe flow of signal processing operations and realize distinct digitalsignal processing (DSP) block schemes.

In section III, an integrated minimum variance distortionless response(MVDR) beamformer is discussed as applied to a local microphone array.In particular, section III describes an integrated MVDR beamformer,which leverages the use of a priori assumptions and the use of estimatedquantities. In section IV, an integrated MVDR beamformer as applied to alocal microphone array together with one or more external microphones isdescribed. Again, an integrated MVDR beamformer for application to alocal microphone array together with one or more external microphones,which leverages the use of a priori assumptions and the use of estimatedquantities is described.

II. Data Model A. Unprocessed Signals

Consider a noise reduction system that consists of a local microphonearray (LMA) of M_(a) microphones and M_(e) external microphones,providing a total of M_(a)+M_(e) number of microphones. Also consider ascenario where there is only one desired/target sound source, such as atarget speech source, in a noisy environment. Proceeding to formulatethe problem in the short-time Fourier transform (STFT) domain, thereceived signal can be represented at one particular frequency, k, andone time frame, l as:

$\begin{matrix}{{y\left( {k,l} \right)} = {{x\left( {k,l} \right)} + {n\left( {k,l} \right)}}} & (1) \\{= {{{a\left( {k,l} \right)}{s\left( {k,l} \right)}} + {n\left( {k,l} \right)}}} & (2)\end{matrix}$

where (dropping the dependency on k and l for brevity), y=[y_(a) ^(T),y_(e) ^(T)]^(T), y_(a)=[y_(a,1) y_(a,2) . . . y_(a,M) _(a) ]^(T) are thelocal microphone signals, y_(e)=[y_(e,1) y_(e,2) . . . y_(e,M) _(e)]^(T) are the external microphone signals, x is the speech componentconsisting of a=[a_(a) ^(T) a_(e) ^(T)]^(T), oustic transfer function(ATF) from the speech source to all M_(a)+M_(e) microphones and s, thespeech source signal. Finally, n=[n_(a) ^(T) n_(e) ^(T)]^(T) representsthe noise component, which consists of a combination of correlated anduncorrelated noises. Variables with the subscript “a” refer to the LMAsignals and variables with the subscript “e” refer to the XM signals.The dependencies on k and l will be introduced herein, as needed, formathematical derivations.

In general, the speech component (target sound), x, can be representedin terms of a relative transfer function (RTF) vector such that:

x=as=hs₁  (3)

where s₁=a_(a,1)s, is the speech in a reference microphone of the LMA(w.l.o.g the first microphone is chosen as the reference microphone) andh is the RTF vector defined as:

$\begin{matrix}{h = {\left\lbrack {{1\frac{a_{a,2}}{a_{a,1}}\cdots\frac{a_{a,M_{a}}}{a_{a,1}}}❘{\frac{a_{e,1}}{a_{a,1}}\cdots\frac{a_{e,M_{e}}}{a_{a,1}}}} \right\rbrack^{T} = {\left\lbrack {{1h_{a,2}{\cdots h}_{a,M_{a}}}❘{h_{e,1}h_{e,2}{\cdots h}_{e,M_{e}}}} \right\rbrack^{T} = \left\lbrack {h_{a}^{T}❘h_{e}^{T}} \right\rbrack^{T}}}} & (4)\end{matrix}$

consisting of an RTF vector corresponding to the LMA signals, h_(a) andan RTF vector corresponding to the XM signals, h_(e). With such aformulation, the noise reduction system will aim to produce an estimatefor the speech component in the reference microphone, s₁.

The (M_(a)+M_(e))×(M_(a)+M_(e)) speech-plus-noise, noise-only, andspeech-only spatial correlation matrices are given respectively as:

R _(yy)=

{yy ^(H)}  (5)

R _(nn)=

{nn ^(H)}  (6)

R _(xx)=

{xx ^(H)}  (7)

where

{.} is the expectation operator and H is the Hermitian transpose. It isassumed that the speech components are uncorrelated with the noisecomponents, and hence the speech-only correlation matrix can be foundfrom the difference of the speech-plus-noise correlation matrix and thenoise-only correlation matrix:

R _(xx) =R _(yy) −R _(nn)  (8)

The speech-plus-noise and noise-only correlation matrices are estimatedfrom the received microphone signals during speech-plus-noise andnoise-only periods, using a voice activity detector (VAD). Thecorrelation matrices can also be calculated solely for the LMA signalsrespectively as R_(y) _(a) _(y) _(a) =

{y_(a)y_(a) ^(H)}, R_(n) _(a) _(n) _(a) =

{n_(a)n_(a) ^(H)}, and R_(x) _(a) _(x) _(a) =

{x_(a)x_(a) ^(H)} (which can be realized by the top left (M_(a)×M_(a))block of the corresponding entire correlation matrices in (5)-(7)).

The estimate of the speech component in the reference microphone, z₁, isthen obtained through the linear filtering of the microphone signals,such that:

$\begin{matrix}{z_{1} = {w^{H}y}} & (9)\end{matrix}$

Where w=[w_(a) ^(T)w_(e) ^(T)]^(T) is the complex-valued filter to bedesigned.

B. Transformed Domain

As will be described later, working with the signals in a transformeddomain will result in convenient relations to be made and an overallsimplification of the flow of signal processing operations. Thetransformation will be based on an a priori assumed RTF vector for theLMA signals, {tilde over (h)}_(a) (which may or may not be equal toh_(a)). Firstly, an M_(a)×(M_(a)−1) unitary blocking matrix B_(a) for{tilde over (h)}_(a) and an M_(a)×1 vector b_(a) are defined such that:

$\begin{matrix}{{{B_{a}^{H}{\overset{\sim}{h}}_{a}} = 0};{b_{a} = \frac{{\overset{\sim}{h}}_{a}}{{\overset{\sim}{h}}_{a}}}} & (10)\end{matrix}$

where B_(a) ^(H)B_(a)=I(_(M) _(a) ⁻¹) and in general I_(D) denotes theϑ×ϑ identity matric, and b_(a) can be interpreted as a scaled matchedfilter. W.l.o.g, b_(a) will simply be referred to as a matched filter inthe following derivations. Using B_(a) and h_(a), an (M_(a)+M_(e))×(M_(a) +M_(e)) unitary transformation matrix, T, can besubsequently defined:

$\begin{matrix}{T = {\begin{bmatrix}T_{a} & 0 \\0 & I_{M_{e}}\end{bmatrix} = \begin{bmatrix}\left\lbrack {B_{a}b_{a}} \right\rbrack & 0 \\0 & I_{M_{e}}\end{bmatrix}}} & (11)\end{matrix}$

where T_(a)=[B_(a), b_(a)],T_(a) ^(H)T_(a)=I_(M) _(a) , and hence indeedT^(H)T=I_((M) _(a) _(+M) _(e) ₎. Consequently, the transformed inputsignals, y, become:

$\begin{matrix}{{T^{H}y} = {\begin{bmatrix}{T_{a}^{H}y_{a}} \\y_{e}\end{bmatrix} = \begin{bmatrix}{B_{a}^{H}y_{a}} \\{b_{a}^{H}y_{a}} \\y_{e}\end{bmatrix}}} & (12)\end{matrix}$

The transformed noise signals can also be similarly defined:

$\begin{matrix}{{T^{H}n} = {\begin{bmatrix}{T_{a}^{H}n_{a}} \\n_{e}\end{bmatrix} = \begin{bmatrix}{B_{a}^{H}n_{a}} \\{b_{a}^{H}n_{a}} \\n_{e}\end{bmatrix}}} & (13)\end{matrix}$

It should be understood that this transformation domain is the LMAsignals that pass through a blocking matrix and a matched filter, as inthe first stage of a generalized sidelobe canceller (GSC) (i.e., theadaptive implementation of an MVDR beamformer), along with the XMsignals.

C. Pre-Whitened-Transformed Domain

A spatial pre-whitening operation can be defined from the noise-onlycorrelation matrix in the previously described transform domain by usingthe Cholesky decomposition:

{(T ^(H) n)(T ^(H) n)^(H) }=LL^(H)  (14)

where L is an (M_(a)+M_(e))×(M_(a)+M_(e)) lower triangular matrix. Inblock form, L can be realized as:

Where L_(a) and L_(x) are lower triangular matrices. It should be notedthat L_(a) corresponds to the LMA signals and are from a Choleskydecomposition of the noise correlation matrix from the LMA signals inthe transformed domain, hence:

{(T _(a) ^(H) n _(a))(T _(a) ^(H) n _(a))^(H)}=L _(a)L_(a) ^(H)  (16)

A signal vector in the transformed domain can be consequentlypre-whitened by pre-multiplying it with L⁻¹. Such signal quantities willbe denoted with the underbar (.) notation. Hence, the signal y in thisso-called pre-whitened-transformed domain is given by:

$\begin{matrix}\begin{matrix}{\underset{\_}{y} = {\begin{bmatrix}{\underset{\_}{y}}_{a} \\{\underset{\_}{y}}_{e}\end{bmatrix} = {L^{- 1}T^{H}y}}}\end{matrix} & (17)\end{matrix}$

and similarly for n:

$\begin{matrix}{{\underset{\_}{n}==\begin{bmatrix}{\underset{\_}{n}}_{a} \\{\underset{\_}{n}}_{e}\end{bmatrix}} = {L^{- 1}T^{H}n}} & (18)\end{matrix}$

The respective correlation matrices are also given by:

R _(yy) =

{yy ^(H)}

R _(nn) =

{nn ^(H) }=I _((M) _(a) _(+M) _(e) ₎

R _(xx) =R _(yy) −R _(nn)

The spatial correlation matrices for the speech and noise and thenoise-only, and the speech-only can also be calculated solely for theLMA signals respectively as R _(y) _(a) _(y) _(a) =

{y _(a) y _(a) ^(H)}, R _(n) _(a) _(n) _(a) =I_(M) _(a) , and R _(x)_(a) _(x) _(a) =R _(y) _(a) _(y) _(a) −R _(n) _(a) _(n) _(a) .

D. Summary of Symbols and Realization

FIG. 1 is a block diagram illustrating the flow of the previouslydescribed transformations on the unprocessed signals. Transformationblock 102 is a processing block that represents the first transformationof section II-B, in which the LMA signals pass through a blocking matrix104 and a matched filter 106, analogous to the first stage of a GSC. TheXM signals are unaltered. The pre-whitening block 108 is a processingblock that represents the pre-whitening operation of section II-C,yielding signals 109 in the pre-whitened-transformed domain. The noisereduction filters that will be developed below will then be directlyapplied to these pre-whitened-transformed signals (i.e., the output ofpre-whitening block 108) in order to yield the desired speech estimate.

The following is also a summary of how the symbolic notation should beinterpreted throughout this document:

-   -   (.)_(a) refer to quantities associated with the LMA signals,        e.g., y_(a).    -   (.)_(e) refer to quantities associated with the XM signals,        e.g., y_(e).    -   refer to a priori assumed quantities, e.g., {tilde over (h)}.    -   refer to estimated quantities, e.g., ĥ.    -   (.) refer to quantities in the pre-whitened-transformed domain,        e.g., y _(a).

III. MVDR Using a LMA (MVDR)

The MVDR beamformer minimizes the total noise power (minimum variance),while preserving the received signal in a particular direction(distortionless response). This direction is specified by defining theappropriate RTF vector for the MVDR beamformer. Considering only theLMA, the MVDR problem can be formulated as follows (which will bereferred to as the MVDR_(a)):

$\begin{matrix}\min\limits_{w_{a}} & {w_{a}^{H}R_{n_{a}n_{a}}w_{a}} \\{s.t.} & {{w_{a}^{H}h_{a}} = 1}\end{matrix}$

where h_(a) is the RTF vector from (4), which in practice is unknown andhence will be replaced either by a priori assumptions or estimated fromthe speech-plus-noise correlation matrices. The optimal noise reductionfilter is then given by:

$\begin{matrix}{w_{a} = \frac{R_{n_{a}n_{a}}^{- 1}h_{a}}{h_{a}^{H}R_{n_{a}n_{a}}^{- 1}h_{a}}} & (23)\end{matrix}$

Finally, the speech estimate, z_(a,1), from this MVDR_(a) beamformer isobtained through the linear filtering of the microphone signals with thecomplex-valued filter w_(a):

z_(a,1)=w_(a) ^(H)y_(a)  (24)

In sections III-A and III-B, strategies for designing an MVDR_(a)beamformer using an RTF vector based either on a priori assumptions orestimated from the speech-plus-noise correlation matrices are discussed.Section III-C illustrates an integrated beamformer that integrates theuse of priori assumptions with estimates.

A. Using an a priori Assumed RTF Vector

The MVDR_(a) problem can be formulated as in (22), except with using ana priori assumed RFT vector, {tilde over (h)}_(a)=[1 {tilde over(h)}_(a,2) . . . {tilde over (h)}_(a,m)]^(T) instead of h_(a). This{tilde over (h)}_(a) can be based on a priori assumptions regardingmicrophone characteristics, position, speaker location and roomacoustics (e.g., no reverberation). Similar to (23), the optimal noisereduction filter is then given by:

$\begin{matrix}{{\overset{\sim}{w}}_{a} = \frac{R_{n_{a}n_{a}}^{- 1}{\overset{˜}{h}}_{a}}{{\overset{˜}{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}h_{a}}} & (25)\end{matrix}$

The speech estimate, {tilde over (z)}_(a,1), from this MVDR_(a) with ana priori assumed RTF vector is then:

{tilde over (z)}_(a,1)={tilde over (w)}_(a) ^(H)y_(a)  (26)

This conventional formulation of the MVDR_(a) can also be equivalentlyposed in the pre-whitened-transformed domain (section II-C). As derivedin Appendix A, the speech estimate in this domain is given by:

$\begin{matrix}\begin{matrix}{{\overset{˜}{z}}_{a,1} = {\frac{l_{M_{a}}}{{\overset{˜}{h}}_{a}}{\underset{\_}{y}}_{a,M_{a}}}}\end{matrix} & (27)\end{matrix}$

Where l_(M) _(a) is the bottom-right element in L_(a), and y _(a,ma) isthe last component of the pre-whitened-transformed signals, y _(a). Inother words, the speech estimate for an MVDR_(a) filter that uses an apriori assumed RTF vector results in a simple scaling of the lastcomponent of the pre-whitened-transformed signals. With such aformulation in this domain, this beamforming algorithm can be realizedin a distinct set of signal processing blocks as illustrated in FIG. 2.

More specifically, FIG. 2 illustrates transformation block 102 andpre-whitening block 108, as described above with reference to FIG. 1.However, in the example of FIG. 2, in-whitening block 108, the only thelast row of L_(a) ⁻¹ is used, (16), thus the resulting in the signalya_(M) _(a) . Also shown is an a priori filter 110, which produces

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

and processing block 112 which applies

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

to y _(a,M) _(a) . The application of

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

to y _(a,M) _(a) produces an a priori speech estimate {tilde over(z)}_(a,1). The apriori speech estimate, {tilde over (z)}_(a,1), is anestimate of the target sound (e.g., speech) in the received soundsignals, based solely on an a priori RTF vector. The RTF vector isgenerated uses assumptions regarding, for example, location of thesource of the target sound, characteristics of the microphones (e.g.,microphone calibration in regards to gains, phases, etc.), reverberantcharacteristics of the target sound source, etc. The a priori speechestimate {tilde over (z)}_(a,1) is an example of an a priori estimate ofat least one target sound in the received sound signals.

B. Using an Estimated RTF Vector

The RTF vector may also be estimated without reliance on any a prioriassumptions and can be used to enhance the speech regardless of thespeech source location. One such method is a method of covariancewhitening or equivalently that which involves a Generalized EigenvalueDecomposition (GEVD).

In such examples, a rank-1 matrix approximation problem can beformulated to estimate the RTF vector for a given set of LMA signalssuch that:

$\begin{matrix}{\min\limits_{{\hat{R}}_{x,{r\; 1}}}{{\left( {R_{y_{a}y_{a}} - R_{n_{a}n_{a}}} \right) - {\hat{R}}_{{xa},{r\; 1}}}}_{F}^{2}} & (28)\end{matrix}$

where ∥.∥_(F) is the Frobenius norm, and {circumflex over (R)}_(xa,r1)is a rank-1 approximation to (R_(y) _(a) _(y) _(a) −R_(n) _(a) _(n) _(a)) defined as:

{circumflex over (R)} _(xa,r1)={circumflex over (ϕ)}_(xa,r1) ĥ _(a) ĥ_(a) ^(H)  (29)

Where ĥ_(a)=[1ĥ_(a,2) . . . ĥ_(a,M) _(a) ]^(T) the estimated RTF vector.

As opposed to using the raw signal correlation matrices, the estimationproblem of (28) can be equivalently formulated in thepre-whitened-transformed domain. In appendix B, it is shown that theestimated RTF vector is then:

$\begin{matrix}{{\overset{\hat{}}{h}}_{a} = \frac{T_{a}L_{a}p_{\max}}{\eta_{\rho}}} & (30)\end{matrix}$

where p_(max) is a generalized eigenvector of the matrix pencil {R _(y)_(a) _(y) _(a) , R _(n) _(a) _(n) _(a) }, which as a result of thepre-whitening (R _(n) _(a) _(n) _(a) =1_(M) _(a) ) corresponds to theprincipal (first in this case) eigenvector of R _(y) _(a) _(y) _(a) ,the scaling η_(ρ)=e_(a1) ^(T)T_(a)L_(a)P_(max) and the M×1 vectore_(a1)=[1 0 . . . 0]^(T). The resulting MVDR_(a) using this estimatedRTF vector is now given by:

$\begin{matrix}{{\overset{\hat{}}{w}}_{a} = \frac{R_{n_{a}n_{a}}^{- 1}{\overset{\hat{}}{h}}_{a}}{{\overset{\hat{}}{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}{\overset{\hat{}}{h}}_{a}}} & (31)\end{matrix}$

As was done in section III-A, this filter based on estimated quantitiescan also be reformulated in the transformed, pre-whitened-transformeddomain. Leaving the derivations once again to Appendix B, thecorresponding speech estimate using the estimated RTF vector is:

$\begin{matrix}{{{\overset{\hat{}}{z}}_{a,1} = {\eta_{\rho}p_{\max}^{H}\underset{\underset{{\underset{\_}{y}}_{a}}{︸}}{L_{a}^{- 1}T_{a}^{H}y_{a}}}}\begin{matrix}{{\hat{z}}_{a,1} = {\eta_{\rho}p_{\max}^{H}{\underset{\_}{y}}_{a}}}\end{matrix}} & (32)\end{matrix}$

where η*_(ρ)*p_(max) can be considered as the pre-whitened-transformedfilter (where {.}* is the complex conjugate), which can be used todirectly filter the pre-whitened, transformed signals, y _(a). Theseoperations can also be realized in a distinct set of signal processingblocks, as illustrated in FIG. 3.

More specifically, FIG. 3 illustrates transformation block 102 andpre-whitening block 108, as described above with reference to FIG. 1,which produce pre-whitened-transformed signals. Also shown is block 114,which filters the pre-whitened-transformed signals in accordance withη_(ρ) ^(*)p_(max) (i.e., 114 represents the hermitian transposedpre-whitened-transformed filter). The output of thepre-whitened-transformed filter 114 is a direct speech estimate,{circumflex over (z)}_(a,1) (i.e., (32), above).

The direct speech estimate, {circumflex over (z)}_(a,1), is an estimateof the target sound (e.g., speech) in the received sound signals, basedsolely on an estimated RTF vector. The estimated RTF vector is generatedusing real-time estimates of, for example, the location of the source ofthe target sound, reverberant characteristics of the target soundsource, etc. The direct speech estimate, {circumflex over (z)}_(a,1), isan example of a direct estimate of at least one target sound in thereceived sound signals.

C. Integrated MVDR_(a) Beamformer

Described above are two general MVDR approaches, one that imposes apriori assumptions for the definition of the RTF vector in the MVDRfilter, and another that involves an estimation of this RTF vector. Inconventional arrangements, a choice typically has to be made between oneof these approaches with an acceptance of their inevitable drawbacks.However, in accordance the integrated noise reduction techniquespresented herein, both approaches are integrated into one global filter,referred to herein as an “integrated MVDR_(a) beamformer” that exploitsthe benefits of each approach.

In general, the integrated MVDR_(a) beamformer provides for integratedtunings which allow different “weights” to be applied to each of (1) ana priori assumed representation of target sound within received soundsignals (e.g., an a priori estimate of at least one target sound in thereceived sound signals), and (2) an estimated representation of thetarget sound within received sound signals (e.g., a direct estimate ofat least one target sound in the received sound signal). The weightsapplied to each of the a priori assumed representation of the targetsound and the estimated representation of the target sound are selectedbased on “confidence measures” associated with each of the a prioriassumed representation of the target sound and the estimatedrepresentation of the target sound, respectively.

For instance, with the integrated MVDR_(a) beamformer, if the speechsource moves outside of the direction defined by an a priori assumed RTFvector, more weight can be given to an estimated RTF vector to accountfor the loss in performance that would otherwise result from using the apriori assumed RTF vector alone. On the other hand, if the estimated RTFvector becomes unreliable, less weight can be given thereto and thesystem can revert to using the a priori assumed RTF vector, which mayhave an improved performance if the speech source is indeed in thedirection defined by the a priori assumed RTF vector. Combination/mixingof the a priori assumed RTF vector and the estimated RTF vector is alsopossible. That is, the tuning parameters can achieve multiplebeamformers, i.e. one that relies on a priori assumptions alone, onethat relies on estimated quantities alone, or the mixture of both.

One particular tuning of interest may be to place a large weight on an apriori assumed RTF vector, but weighting an estimated RTF vector onlywhen appropriate. This represents a mechanism for reverting to an apriori assumed RTF vector when the estimated RTF vector was unreliable.

In the following, the integrated MVDR_(a) beamformer is briefly derived.If the case is considered where ĥ_(a) is defined according to a prioriassumptions and ĥ_(a) is estimated from (86), an integrated MVDR_(a)cost function can be given as:

$\begin{matrix}{{\min\limits_{w_{a}}w_{a}^{H}R_{n_{a}n_{a}}w_{a}} + {\alpha{{{w_{a}^{H}{\overset{˜}{h}}_{a}} - 1}}^{2}} + {\beta{{{w_{a}^{H}{\overset{\hat{}}{h}}_{a}} - 1}}^{2}}} & (33)\end{matrix}$

where α∈[0, ∞] and β∈[0, ∞] are tuning parameters that control how muchof the respective RTF vectors (i.e., the apriori assumed RTF vector andthe estimated RTF vector) are weighted. This cost function is thecombination of that of an MVDR_(a) (as in (22)) defined by {tilde over(h)}_(a) and another defined by ĥ_(a), except that the constraints havebeen softened by α and β.

The solution to (33) is given by:

w _(a,int) =f _(pr)(α, β){tilde over (w)} _(a) +f _(est)(α,β)ŵ_(a)  (34)

where {tilde over (w)}_(a) and ŵ_(a) are defined in (25) and (31)respectively.

$\begin{matrix}{{f_{pr}\left( {\alpha,\beta} \right)} = \left\lbrack \frac{\alpha{k_{dd}\left\lbrack {1 + {\beta\left( {k_{pp} - k_{dp}} \right)}} \right\rbrack}}{{\alpha k_{dd}} + {\beta k_{pp}} + {\alpha{\beta\left( {{k_{pp}k_{dd}} - {k_{dp}k_{pd}}} \right)}} + 1} \right\rbrack} & (35) \\{{f_{est}\left( {\alpha,\beta} \right)} = \left\lbrack \frac{\beta{k_{pp}\left\lbrack {1 + {\alpha\left( {k_{dd} - k_{pd}} \right)}} \right\rbrack}}{{\alpha k_{dd}} + {\beta k_{pp}} + {\alpha{\beta\left( {{k_{pp}k_{dd}} - {k_{dp}k_{pd}}} \right)}} + 1} \right\rbrack} & (36)\end{matrix}$

with the constants:

$\begin{matrix}{{{k_{dd} = {{\overset{\sim}{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}{\overset{\sim}{h}}_{a}}};{k_{pp} = {{\hat{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}{\hat{h}}_{a}}};}{{k_{dp} = {{\overset{\sim}{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}{\hat{h}}_{a}}};{k_{pd} = {{\hat{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}{\overset{\sim}{h}}_{a}}}}} & (37)\end{matrix}$

This integrated MVDR beamformer reveals that the MVDR_(a) beamformerbased on a priori assumptions from (25) and that which is based onestimated quantities from (31) can be combined according to thefunctions f_(pr)(aα, β) and f_(est)(α, β) respectively.

As in the previous sections, this integrated beamformer can also beexpressed in the pre-whitened-transformed domain as follows:

$\begin{matrix}{w_{a,{int}} = {{{f_{pr}\left( {\alpha,\beta} \right)}T_{a}L_{a}^{- H}\frac{l_{M_{a}}}{{\overset{˜}{h}}_{a}}} + {{f_{est}\left( {\alpha,\beta} \right)}T_{a}L_{a}^{- H}\eta_{p}p_{\max}}}} & (38)\end{matrix}$

and with the constants equivalently, but alternatively defined as:

$\begin{matrix}{{k_{dd} = {{\underset{\_}{\overset{\sim}{h}}}_{a}^{H}{\underset{\_}{\overset{\sim}{h}}}_{a}}};{k_{pp} = {{\underset{\_}{\hat{h}}}_{a}^{H}{\underset{\_}{\hat{h}}}_{a}}};{k_{dp} = {{\underset{\_}{\overset{\sim}{h}}}_{a}^{H}{\underset{\_}{\hat{h}}}_{a}}};{k_{pd} = {{\underset{\_}{\hat{h}}}_{a}^{H}{\underset{\_}{\overset{\sim}{h}}}_{a}}}} & (39)\end{matrix}$

where ĥ_(a) and ĥ_(a) are given in (79) and (88) respectively.

The resulting speech estimate from this integrated beamformer is thengiven by:

$\begin{matrix}{{{\overset{\hat{}}{z}}_{a,{int}} = {{{f_{pr}^{*}\left( {\alpha,\beta} \right)}\frac{l_{M_{a}}}{{\overset{˜}{h}}_{a}}} + {\underset{¯}{y}}_{a,M_{a}} + {{f_{est}^{*}\left( {\alpha,\beta} \right)}\eta_{p}p_{\max}^{H}{\underset{¯}{y}}_{a}}}}\begin{matrix}{{\hat{z}}_{a,{int}} = {{{f_{pr}^{*}\left( {\alpha,\beta} \right)}{\overset{\sim}{z}}_{a,1}} + {{f_{est}^{*}\left( {\alpha,\beta} \right)}{\hat{z}}_{a,1}}}}\end{matrix}} & (40)\end{matrix}$

The benefit of this pre-whitened-transformed domain is apparent where,with such an integrated beamformer of (38), {tilde over (w)}_(a,M) _(a)and {circumflex over (w)}_(a) can be directly used to filter thepre-whitened-transformed signals, and then combined with the appropriateweightings as defined by the functions f_(pr)(α, β) and f_(est)(α, β),to yield the respective speech estimate. These functions f_(pr)(α, β)and f_(est)(α, β) can be tuned such as to emphasize the result from anMVDR beamformer that uses either an a priori assumed RTF vector or anestimated RTF vector. This results in a digital signal scheme asdepicted in FIG. 4.

More specifically, FIG. 4 is a block diagram of an integrated MVDR_(a)beamformer 125 in accordance with embodiments presented herein. Theintegrated MVDR_(a) beamformer 125 comprises a plurality of processingblocks, which include transformation block 102 and pre-whitening block108. As described above with reference to FIG. 1 transformation block102 and pre-whitening block 108 produce signals 109 in thepre-whitened-transformed domain (pre-whitened-transformed signals).

Also shown in FIG. 4 are two processing branches 113(1) and 113(2) thateach operate based on all or part of the pre-whitened-transformedsignals 109. The first processing branch 113(1) includes an a priorifilter 110, which produces

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

and a processing block 112 which applies

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

to y _(a,M) _(a) . The application of

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

to y _(a,M) _(a) generates the a priori speech estimate {tilde over(z)}_(a,1), that is generated based solely on an a priori RTF vector(i.e., an estimate of the speech in the received sound signals, basedsolely on a priori assumptions such as microphone characteristics,source location, and reverberant characteristics of the target sound(e.g., speech) source. In other words, application of

$\frac{l_{M_{a}}}{{\overset{\sim}{h}}_{a}}$

to y _(a,M) _(a) generates an a priori estimate of at least one targetsound in the received sound signals.

The first branch 113(1) also comprises a first weighting block 116. Thefirst weighting block 116 is configured to weight the speech estimate,{circumflex over (z)}_(a,1), in accordance with the complex conjugate ofthe function f_(pr)(α, β) (i.e., (35) and (40), above). More generally,the first weighting block 116 is configured to weight the speechestimate, {circumflex over (z)}_(a,1), in accordance with a costfunction controlled by a plurality of tuning parameters (e.g., (α, β)).The tuning parameters of the cost function (e.g., f_(pr)(α, β)), are setbased on one or more confidence measures 118 generated for the speechestimate, {circumflex over (z)}_(a,1). The one or more confidencemeasures 118 represent an assessment or estimate of theaccuracy/reliability of the a priori speech estimate, {circumflex over(z)}_(a,1), and the hence the accuracy of the a priori RTF vector usedto generate the speech estimate, {circumflex over (z)}_(a,1). The firstweighting block 116 generates a weighted a priori speech estimate, shownin FIG. 5 by arrow 119.

The second branch 113(2) includes a pre-whitened-transformed filter 114,which filters the pre-whitened-transformed signals in accordance with(32). The output of the pre-whitened-transformed filter 114 is a directspeech estimate, {circumflex over (z)}_(a,1), that is generated basedsolely on an estimated RTF vector (i.e., an estimate of the speech inthe received sound signals, which takes into consideration microphonecharacteristics and may contain information such as the location andsome reverberant characteristics of the speech source). In other words,the direct speech estimate {circumflex over (z)}_(a,1), is an example ofa direct estimate of at least one target sound in the received soundsignals.

The second branch 113(2) also comprises a second weighting block 120.The second weighting block 120 is configured to weight the speechestimate, {circumflex over (z)}_(a,1), in accordance with complexconjugate of the function f_(est)(α, β) (i.e., (36) and (40), above).More generally, the second weighting block 120 is configured to weightthe direct speech estimate, {circumflex over (z)}_(a,1), in accordancewith a cost function controlled by a plurality of tuning parameters(e.g., (α, β)). The tuning parameters of the cost function (e.g.,f_(est)(α, β) are set based on one or more confidence measures 122generated for the speech estimate, {circumflex over (z)}_(a,1). The oneor more confidence measures 122 represent an assessment or estimate ofthe accuracy/reliability of the speech estimate, {circumflex over(z)}_(a,1), and the hence the accuracy of the estimated RTF vector usedto generate the speech estimate, {circumflex over (z)}_(a,i). The secondweighting block 120 generates a weighted direct speech estimate, shownin FIG. 5 by arrow 123.

FIG. 4 also illustrates processing block 124 which integrates/combinesthe weighted a priori speech estimate 119 and the weighted direct speechestimate 123. The combination of the weighted a priori speech estimate119 and the weighted direct speech estimate 123 is referred to as anintegrated speech estimate, (i.e., (40), above). The integrated speechestimate may be used for subsequent processing in the device (e.g.,auditory prosthesis).

IV. MVDR with a LMA and XM Signals (MVDR_(a,e))

Section III, above, illustrates an embodiment in which the integratedbeamformer operates based on local microphone array (LMA) signals. Asnoted above, LMA signals are generated by a local microphone array (LMA)that are part of the device that performs the integrated noise reductiontechniques. In the case of auditory prostheses, such as cochlearimplants, the LMA is worn on the recipient.

As described further below, the integrated noise reduction techniquesdescribed herein can be extended to include external microphone (XM)signals, in addition to the LMA signals. These XM signals are generatedby one or more external microphones (XMs) that are not part of thedevice that performs the integrated noise reduction techniques, but thatcan nevertheless communicate with the device (e.g., via a wirelessconnection). The external microphones may be any type of microphone(e.g., microphones in a wireless microphone device, microphones in aseparate computing device (e.g., phone laptop, tablet, etc.),microphones in another auditory prosthesis, microphones in a conferencephone system, microphones in hands-free system, etc.) for which thelocation of the microphone(s) is unknown relative to the microphones ofthe LMA. In other words, as used herein, an external microphone may beany microphone that has an unknown location, which may change over time,with respect to the local microphone array.

Extending the techniques herein to the use of LMA signals and XMsignals, the integrated beamformer is referred to as the MVDR_(a,e):

$\begin{matrix}\min\limits_{w} & {w^{H}R_{nn}w} \\{s.t} & {{w^{H}h} = 1}\end{matrix}$

where h is the RTF vector ((4), above) that includes M_(a) componentscorresponding to the LMA, h_(a), and M_(e) components corresponding tothe XMs, h_(e), and R_(nn) is the (M_(a)+M_(e))×(M_(a)+M_(e)) noisecorrelation matrix:

$\begin{matrix}{{R_{nn} =}\begin{bmatrix}\begin{matrix}R_{n_{a}n_{a}} \\\left( {M_{a} \times M_{a}} \right)\end{matrix} & \begin{matrix}R_{n_{a}n_{e}} \\\left( {M_{a} \times M_{e}} \right)\end{matrix} \\\begin{matrix}R_{n_{a}n_{e}}^{H} \\\left( {M_{e} \times M_{a}} \right)\end{matrix} & \begin{matrix}R_{n_{e}n_{e}} \\\left( {M_{e} \times M_{e}} \right)\end{matrix}\end{bmatrix}} & (42)\end{matrix}$

where the upper left block is the noise correlation matrix from the LMAsignals, R_(n) _(a) _(n) _(e) , is the noise cross-correlation betweenthe LMA signals and the XM signals and R_(n) _(e) _(n) _(e) is the noisecorrelation of the XM signals. Similar to (23), the solution to (41) isgiven by:

$\begin{matrix}{w = \frac{R_{nn}^{- 1}h}{h^{H}R_{nn}^{1}h}} & (43)\end{matrix}$

with the speech estimate, z=w^(H)y. Since, as noted above, the XMs havean unknown location, which may change over time, with respect to thelocal microphone array, generally no a priori assumptions can be madeabout the location of the XMs. Consequently, there are two potentialapproaches that can be taken in order to find h, namely: (i) only themissing component of the RTF vector corresponding to that of the XMsignals needs to be estimated, while the a priori assumed RTF vector forthe LMA signals is preserved; or (ii) the entire RTF vector is estimatedfor the LMA signals and the XM signals. In sections, IV-A and IV-Bstrategies for both approaches are briefly described.

A. Using a Partial a Priori Assumed RTF Vector and Partial Estimated RTFVector

As previously mentioned, one option for the definition of h for theMVDR_(a,e) is such that the a priori RTF vector for the LMA signals,{tilde over (h)}_(a), is preserved and only the RTF vector for the XMsignals is estimated. Such an RTF will therefore be defined as follows:

$\begin{matrix}{\overset{\sim}{h} = \left\lbrack {\overset{\sim}{h}}_{a}^{T} \middle| {\hat{h}}_{e}^{T} \right\rbrack^{T}} & (44)\end{matrix}$

It should be noted that although {tilde over (h)} partially contains anestimated RTF vector, this is done with respect to the a prioriassumptions set by {tilde over (h)}_(a), and hence the notation for{tilde over (h)} is kept to be that of an a priori RTF vector (this isfurther elaborated upon in section IV-E). A method to compute {tildeover (h)}_(e) in the case of one XM using the cross-correlation betweenthe external microphone and a speech reference provided by (26) using aGEVD is outlined below

As in (28) a rank-1 matrix approximation problem can be formulated toestimate an entire RTF vector for a given set of microphone signals suchthat:

$\begin{matrix}{\min\limits_{{\overset{\sim}{R}}_{x,{r\; 1}}}{{\left( {R_{yy} - R_{nn}} \right) - {\overset{\sim}{R}}_{x,{r1}}}}_{F}^{2}} & (45)\end{matrix}$

where {tilde over (R)}_(x,r1) is a rank-1 approximation to R_(xx)(recall (8)). The a priori assumed RTF vector for the LMA signals canalso be included for the definition of {tilde over (R)}_(x,r1) and henceis given by:

$\begin{matrix}{{\overset{\sim}{R}}_{x,{r1}} = {{{\hat{\Phi}}_{x,{r1}}\begin{bmatrix}{\overset{˜}{h}}_{a} \\{\overset{\hat{}}{h}}_{e}\end{bmatrix}}\left\lbrack {{\overset{˜}{h}}_{a}^{H}{\overset{\hat{}}{h}}_{e}^{H}} \right\rbrack}} & (46)\end{matrix}$

As opposed to using the raw signal correlation matrices, the estimationproblem of (45) can be equivalently formulated in thepre-whitened-transformed domain. In Appendix C, it is shown that theestimated RTF vector could be found from a GEVD on the matrix pencil{J^(T) R _(yy)J, J^(T) R _(nn) _(λ) J}, where the selection matrix,J=[0_((M) _(e) _(+1)×(M) _(a) ⁻¹⁾|I_(M) _(e) ₊₁]^(T). As a result of thepre-whitening (R _(nn)=I_(M) _(a) _(+M) _(e) ), this GEVD canconsequently be computed from the EVD of J^(T) R _(yy) J, which is alower order correlation matrix, of dimensions (M_(e)+1)×(M_(e)+1) thatcould be constructed from the last (M_(e)+1) elements of thepre-whitened-transformed signals, namely that in relation to the lastelement of the LMA−y _(a,M) _(a) , and those in relation to the XMsignals −y _(e). The resulting RTF vector for the XM signals is thendefined from the corresponding principal (first in this case)eigenvector, v_(max):

$\begin{matrix}{{\overset{˜}{h}}_{e} = {\frac{{\overset{˜}{h}}_{a}}{l_{M_{a}}v_{1}}J_{e}^{T}{TL}\; J\; v_{\max}}} & (47)\end{matrix}$

where the selection matrix, J_(e)=[0_((M) _(e) _(×M) _(a) ₎|I_(M) _(e)]^(T).

Finally, this estimate is then used to compute the correspondingMVDR_(a,e) filter with an a priori assumed RTF vector and a partiallyestimated RTF vector as:

$\begin{matrix}{\overset{\sim}{w} = \frac{R_{nn}^{- 1}\overset{˜}{h}}{{\overset{˜}{h}}^{H}R_{nn}^{- 1}\overset{˜}{h}}} & (48)\end{matrix}$

where {tilde over (h)} as defined in (53) can be equivalentlyrepresented as:

$\overset{˜}{h} = {\frac{{\overset{˜}{h}}_{a}}{l_{M_{a}}v_{1}}{TL}\; J\; v_{\max}}$

As was done in section III, this filter can also be reformulated in thepre-whitened-transformed domain. Leaving the derivations once again toAppendix C, the corresponding speech estimate was then found to be:

$\begin{matrix}{{\overset{˜}{z}}_{1} = {\frac{l_{M_{a}}v_{1}}{{\overset{˜}{h}}_{a}}{v_{\max}^{H}\begin{bmatrix}{\underset{\_}{y}}_{a,M_{a}} \\{\underset{\_}{y}}_{e}\end{bmatrix}}}} & (50)\end{matrix}$

where

$\frac{l_{M_{a}}v_{1}^{*}}{{\overset{˜}{h}}_{a}}v_{\max}$

can be considered as a pre-whitened-transformed filter, which can beused to directly filter the last (M_(e)+1) elements of thepre-whitened-transformed signals, i.e. y _(a,M) _(a) and y _(e).

More specifically, FIG. 5 is a block diagram illustrating atransformation block 502 representing the first transformation ofsection II-B, in which the LMA signals pass through a blocking matrix504 and a matched filter 506, analogous to the first stage of a GSC. TheXM signals are unaltered. The pre-whitening block 508 represents thepre-whitening operation. The output of the pre-whitening block 508 issignals in the pre-whitened-transformed domain, referred to aspre-whitened-transformed signals 509.

Also shown in FIG. 5 is filter 530 (i.e., (50), above), which uses thewhitened-transformed signals 509 to generate an a priori speechestimate, {tilde over (z)}₁. As such, the a priori speech estimate,{tilde over (z)}₁, is a speech estimate using a partial a priori assumedRTF vector and partial estimated RTF vector (i.e., using a prioriassumptions for the definition of the RTF vector for the LMA signals,while estimating only the RTF vector for the XM signals). Stateddifferently, the a priori speech estimate, {tilde over (z)}₁, isgenerated from assumptions such as microphone characteristics, locationand reverberant characteristics of the speech within the sound signalsdetected by the LMA, and based on a real-time estimate of speech withinthe sound signals detected by the XM, which adhere to the sameassumptions used for the LMA. The a priori speech estimate {tilde over(z)}₁, is an example of an a priori estimate of at least one targetsound in the received sound signals.

In the case where the RTF vector for both the LMA and XM signals is tobe estimated, a variation of (45) is considered:

$\begin{matrix}{\min\limits_{{\hat{R}}_{x,{r\; 1}}}{{\left( {R_{yy} - R_{nn}} \right) - {\hat{R}}_{x,{r1}}}}_{F}^{2}} & (51)\end{matrix}$

where {circumflex over (R)}_(x,r1) is a rank-1 approximation to R_(xx)(without any a priori information):

$\begin{matrix}{{\hat{R}}_{x,{r1}} = {{{\hat{\Phi}}_{x,{r1}}\hat{h}{\hat{h}}^{H}} = {{{\hat{\Phi}}_{x,{r1}}\begin{bmatrix}{\hat{q}}_{a} \\{\hat{q}}_{e}\end{bmatrix}}\left\lbrack {{\hat{q}}_{a}^{H}{\hat{q}}_{e}^{H}} \right\rbrack}}} & (52)\end{matrix}$

with {circumflex over (q)}_(a) the estimated RTF vector for the LMAsignals and {circumflex over (q)}_(e) the RTF vector for the XM signals.

Once again, it will be convenient to re-frame the problem in thepre-whitened-transformed domain. From the derivations in Appendix D, theestimated RTF vector is given by:

$\begin{matrix}{\overset{\hat{}}{h} = {\begin{bmatrix}{\overset{\hat{}}{q}}_{a} \\{\overset{\hat{}}{q}}_{e}\end{bmatrix} = \frac{{TL}\mspace{11mu} q_{\max}}{\eta_{q}}}} & (53)\end{matrix}$

where q_(max) is a generalized eigenvector of the matrix pencil {R_(yy), R _(n) }, which as a result of the pre-whitening (R _(nn)−1_(M)_(a) _(+M) _(e) ) corresponds to the principal (first in this case)eigenvector of R _(yy), η_(q)=e_(x1) ^(T)TL q_(max) and e_(x1)=[10 . . .0|0 . . . 0]T. The estimated RTF vector can therefore be used as analternative to h for the MVDR_(a,e):

$\begin{matrix}{\overset{\hat{}}{w} = \frac{R_{nn}^{- 1}\overset{\hat{}}{h}}{{\overset{\hat{}}{h}}^{H}R_{nn}^{- 1}\overset{\hat{}}{h}}} & (54)\end{matrix}$

As derived in Appendix D, the corresponding speech estimate in thepre-whitened-transformed domain is given by:

$\begin{matrix}{{{\overset{\hat{}}{z}}_{1} = {\eta_{q}q_{\max}^{H}\underset{\underset{\underset{\_}{y}}{︸}}{L_{\lambda}^{- 1}T^{H}y}}}{{\overset{\hat{}}{z}}_{1} = {\eta_{q}q_{\max}^{H}\underset{¯}{y}}}} & (55)\end{matrix}$

where η*_(q)q_(max) can be considered as a pre-whitened-transformedfilter, which can be used to directly filter thepre-whitened-transformed signals, y.

More specifically, FIG. 6 is a block diagram illustrating atransformation block 502 representing the first transformation ofsection II-B, in which the LMA signals pass through a blocking matrix504 and a matched filter 506, analogous to the first stage of a GSC. TheXM signals are unaltered. The pre-whitening block 508 represents thepre-whitening operation. The output of the pre-whitening block 508 issignals in the pre-whitened-transformed domain, referred to aspre-whitened-transformed signals 509.

Also shown in FIG. 6 is filter 532 (i.e., (55), above), which uses thewhitened-transformed signals 509 to generate a direct speech estimate,{tilde over (z)}₁. As such, the direct speech estimate, {tilde over(z)}₁, is a speech estimate using an estimated RTF vector including boththe LMA and XM signals. Stated differently, the speech estimate, {tildeover (z)}₁, is generated from a real-time estimate of the speech withinthe sound signals detected by both the LMA and XM, which takes intoconsideration microphone characteristics and may contain informationsuch as the location and some reverberant characteristics of the targetsound. The speech estimate {tilde over (z)}₁, is an example of a directestimate of at least one target sound in the received sound signals.

B. Integrated Beamformer

In the case of the integrated MVDR_(a) for the LMA signals in sectionIII-C, two general approaches for designing the beamformer wereconsidered: one that imposes a priori assumptions for the definition ofthe RTF vector in the MVDR filter, and another that involves anestimation of this RTF vector. For the MVDR_(a,e), two analogousapproaches can also be considered: one that imposes a priori assumptionsfor the definition of the RTF vector for the LMA signals, whileestimating only the RTF vector for the XM signals or an estimation ofthe entire RTF vector including both the LMA and XM signals. Although inboth approaches there is an estimation; for the approach where only theRTF vector for the XM signals is estimated, it is done so in accordancewith the a priori assumptions set by the LMA. Therefore, just as in theintegrated MVDR_(a), two general approaches to designing the MVDR_(a,e)according to either a priori assumptions or full estimation can beconsidered. Consequently, an integrated MVDR_(a,e) beamformer can alsobe derived in order to integrate the two general approaches. Theresulting cost function, is:

$\begin{matrix}{{\min\limits_{w}w^{H}R_{nn}w} + {\alpha{{{w^{H}\overset{\sim}{h}} - 1}}^{2}} + {\beta{{{w^{H}\hat{h}} - 1}}^{2}}} & (56)\end{matrix}$

where {tilde over (h)} is defined from (49) and h from (53). Thesolution is then:

w _(int) =g _(pr)(α,β){tilde over (w)}+g _(est)(α, β){tilde over(w)}  (57)

where {tilde over (w)}_(λ) and ŵ_(λ) are given (48) and (54)respectively.

$\begin{matrix}{{g_{pr}\left( {\alpha,\beta} \right)} = \left\lbrack \frac{\alpha{k_{hh}\left\lbrack {1 + {\beta\left( {k_{qq} - k_{hq}} \right)}} \right\rbrack}}{{\alpha k_{hh}} + {\beta k_{qq}} + {\alpha{\beta\left( {{k_{qq}k_{hh}} - {k_{hq}k_{qh}}} \right)}} + 1} \right\rbrack} & (58) \\{{g_{est}\left( {\alpha,\beta} \right)} = \left\lbrack \frac{\beta{k_{qq}\left\lbrack {1 + {\alpha\left( {k_{hh} - k_{qh}} \right)}} \right\rbrack}}{{\alpha k_{hh}} + {\beta k_{qq}} + {\alpha{\beta\left( {{k_{qq}k_{hh}} - {k_{hq}k_{qh}}} \right)}} + 1} \right\rbrack} & (59)\end{matrix}$

with the constants:

$\begin{matrix}{{k_{hh} = {{\overset{\sim}{h}}^{H}R_{nn}^{- 1}\overset{\sim}{h}}};{k_{qq} = {{\hat{h}}^{H}R_{nn}^{- 1}\hat{h}}};{k_{hq} = {{\overset{\sim}{h}}^{H}R_{nn}^{- 1}\overset{\sim}{h}}};{k_{qh} = {{\hat{h}}^{H}R_{nn}^{- 1}\overset{\sim}{h}}}} & (60)\end{matrix}$

As in section III-C, this integrated MVDR_(a,e) beamformer also revealsthat the MVDR_(a,e) beamformer based on a priori assumptions from (48)and that which is based on estimated quantities from (54) can becombined according to the functions g_(pr)(α, β) and g_(est)(a,(3)respectively.

This integrated beamformer can also be expressed in thepre-whitened-transformed domain as follows:

$\begin{matrix}{w_{int_{\lambda}} = {{{g_{pr}\left( {\alpha,\ \beta} \right)}TL^{- H}\frac{l_{M_{a}}v_{1}}{{\overset{˜}{h}}_{a}}{Jv}_{\max}} + {{g_{est}\left( {\alpha,\beta} \right)}TL^{- H}\eta_{q}q_{\max}}}} & (61)\end{matrix}$

and the constants equivalently, but alternatively defined as:

$\begin{matrix}{{k_{hh} = {{\underset{\_}{\overset{\sim}{h}}}^{H}\underset{\_}{\hat{h}}}};{k_{qq} = {{\underset{\_}{\hat{h}}}^{H}\underset{\_}{\hat{h}}}};{k_{hq} = {{\underset{\_}{\overset{\sim}{h}}}^{H}\underset{\_}{\hat{h}}}};{k_{qh} = {{\underset{\_}{\hat{h}}}^{H}\underset{\_}{\overset{\sim}{h}}}}} & (62)\end{matrix}$

where {tilde over (h)} and ĥ are given in (88) from Appendix C and (97)from Appendix D respectively.

The resulting speech estimate from this integrated beamformer is thengiven by:

$\begin{matrix}{{{\overset{\hat{}}{z}}_{int} = {{{g_{pr}^{*}\left( {\alpha,\beta} \right)}\frac{l_{M_{a}}v_{1}}{{\overset{˜}{h}}_{a}}{v_{\max}^{H}\begin{bmatrix}{\underset{¯}{y}}_{a,M_{a}} \\{\underset{\_}{y}}_{e}\end{bmatrix}}} + {{g_{est}^{*}\left( {\alpha,\beta} \right)}\eta_{p}q_{\max}^{H}\underset{¯}{y}}}}{{\overset{\hat{}}{z}}_{int} = {{{g_{pr}^{*}\left( {\alpha,\beta} \right)}{\overset{\sim}{z}}_{1}} + {{g_{est}^{*}\left( {\alpha,\beta} \right)}{\hat{z}}_{1}}}}} & (63)\end{matrix}$

The benefit of the pre-whitened-transformed domain is once againapparent. With such an integrated beamformer, the transformed,pre-whitened signals can be directly filtered accordingly, and thencombined with the appropriate weightings as defined by the functionsg_(pr)(α, β) and g_(est)(α, β), to yield the respective speech estimate.These functions g_(p), (α, β) and g_(est)(α, β) can be tuned such as toemphasize the result from an MVDR beamformer that uses either an apriori assumed RTF vector or an estimated RTF vector. This results in adigital signal processing scheme as depicted in FIG. 7.

More specifically, FIG. 7 is a block diagram of an integrated MVDR_(a,e)beamformer 525 in accordance with embodiments presented herein. Theintegrated MVDR_(a,e) beamformer 525 comprises a plurality of processingblocks, which include transformation block 502 and pre-whitening block508. As described above with reference to FIGS. 5 and 6, thetransformation block 502 represent the first transformation of sectionII-B, in which the LMA signals pass through a blocking matrix 504 and amatched filter 506, while the XM signals are unaltered. Thepre-whitening block 508 represents the pre-whitening operation. Theoutput of the pre-whitening block 508 is signals in thepre-whitened-transformed domain, referred to as pre-whitened-transformedsignals 509.

Also shown in FIG. 7 are two processing branches 513(1) and 513(2) thateach operate based on all or part of the pre-whitened-transformedsignals 509. The first processing branch 513(1) includes a filter 530which, as described above with reference to FIG. 5, uses thewhitened-transformed signals 509 to generate an a priori speechestimate, {tilde over (z)}₁ (i.e., an estimate of the speech in thereceived sound signals, based on a priori assumptions for the definitionof the RTF vector for the LMA signals, while estimating only the RTFvector for the XM signals). The speech estimate {tilde over (z)}₁, is anexample of an a priori estimate of at least one target sound in thereceived sound signals.

The first branch 513(1) also comprises a first weighting block 516. Thefirst weighting block 516 is configured to weight the speech estimate,{tilde over (z)}₁, in accordance with the complex conjugate of thefunction g_(pr)(α, β) (i.e., (58) and (63), above). More generally, thefirst weighting block 516 is configured to weight the speech estimate,{tilde over (z)}₁, in accordance with a cost function controlled by aplurality of tuning parameters (e.g., (α, β)). The tuning parameters ofthe cost function (e.g., g_(pr)(α, β)), are set based on one or moreconfidence measures 518 generated for the speech estimate, {tilde over(z)}₁. The one or more confidence measures 518 represent an assessmentor estimate of the accuracy/reliability of the speech estimate, {tildeover (z)}₁, and the hence the accuracy of the partial a priori assumedRTF vector and partial estimated RTF vector used to generate the speechestimate (i.e., using a priori assumptions for the definition of the RTFvector for the LMA signals, while estimating only the RTF vector for theXM signals). The first weighting block 518 generates a weighted a priorispeech estimate, shown in FIG. 5 by arrow 519.

The second branch 513(2) includes the filter 532 (i.e., (55), above),which uses the whitened-transformed signals 509 to generate a directspeech estimate, {tilde over (z)}₁ (i.e., a speech estimate generatedusing an estimated RTF vector including both the LMA and XM signals).The second branch 513(2) also comprises a second weighting block 520.The second weighting block 520 is configured to weight the direct speechestimate, {tilde over (z)}₁, in accordance with the complex conjugate ofthe function g_(est)(α, β) (i.e., (59) and (63), above). More generally,the second weighting block 120 is configured to weight the direct speechestimate, {circumflex over (z)}₁, in accordance with a cost functioncontrolled by a plurality of tuning parameters (e.g., (α, β)). Thetuning parameters of the cost function (e.g., g_(est)(α, β) are setbased on one or more confidence measures 522 generated for the speechestimate, {tilde over (z)}₁. The one or more confidence measures 522represent an assessment or estimate of the accuracy/reliability of thespeech estimate, {tilde over (z)}₁, and the hence the accuracy of theestimated RTF vector including both the LMA and XM signals. The secondweighting block 520 generates a weighted direct speech estimate, shownin FIG. 5 by arrow 123.

FIG. 7 also illustrates processing block 524 which integrates/combinesthe weighted a priori speech estimate 519 and the weighted direct speechestimate 523. The combination of the weighted a priori speech estimate519 and the weighted direct speech estimate 523 is referred to as anintegrated speech estimate, {circumflex over (z)}_(int) (i.e., (63),above). The integrated speech estimate, {circumflex over (z)}_(int), maybe used for subsequent processing in the device (e.g., auditoryprosthesis).

With this integrated beamformer for both the LMA and XMs, the decisionprocess is now, as shown in the flowchart of FIG. 8, a two stage process840. More specifically, the process 840 is comprised of two maindecisions, referred to as decisions 842 and 844. Referring first to 842,it can be determined whether or not the XM signals are reliable (i.e.,decide whether or not to use the XM signals). If the XM signals are notreliable, the system uses MVDR with LMA only (i.e., MVDR_(a)). If the XMsignals are reliable, the system uses MVDR with LMA and XMs (i.e.,MVDR_(a,e)).

At 844, after determining whether or not the XM signals should be used,a decision is made as to whether or not estimated RTF vector isreliable. In other words, a decision can then be made on how much toweight the a priori assumed RTF vector and the estimated RTF vector.This decision is controlled by a and in the same manner as for theIntegrated MVDR_(a) Beamformer from section III-C. In the case where theXM is used, the a priori assumed RTF vector consists of an a prioriassumed RTF vector for the LMA signals and an estimated RTF vector forthe XM signals, the estimated RTF vector is for both the LMA and XMsignals.

In the second stage of the decision process, it should be noted that inorder to simplify the tuning, α and β could be made inverselyproportional, and can even be tuned such that g_(pr)(α, β) andg_(est)(α, β) form a convex combination. Alternatively, if it is imposedthat α→∞, then this preserves the a priori constraint and it is only βthat remains to be tuned, which would be that of a contingency noisereduction strategy. In the case where both α→∞ and β→∞, this correspondsto two hard constraints imposed upon the noise minimization, and is thenconsidered as a linearly constrained minimum variance (LCMV) beamformer. It is also noted for the case of the MVDR_(a) where α→∞, β=0, that theoriginal MVDR_(a) with a priori constraints is achieved. Hence, theoriginal beamformer has not been compromised and can be reverted to atanytime with this particular tuning.

A summary of the various noise reduction strategies encompassed by thisintegrated beamformer is summarized in FIG. 9. More specifically, FIG. 9includes a table, referred to as Table I, which illustrates limitingcases of α, β for the various MVDR beamformers.

The integrated noise reduction techniques presented herein may beimplemented in a number of devices/systems that include a localmicrophone array (LMA) to capture sound signals. These devices/systemsinclude, for example, auditory prostheses (e.g., cochlear implant,acoustic hearing aids, auditory brainstem stimulators, bone conductiondevices, middle ear auditory prostheses, direct acoustic stimulators,bimodal auditory prosthesis, bilateral auditory prostheses, etc.),computing devices (e.g., mobile phones, tablet computers, etc.),conference phones, hands-free telephone systems, etc. FIGS. 10A, 10B,11, and 12 are schematic block diagrams of example devices configured toimplement the integrated noise reduction techniques presented herein. Itis to be appreciated that these examples are illustrative and that, asnoted, the integrated noise reduction techniques presented herein may beimplemented in a number of different devices/systems.

Referring first to FIG. 10A, shown is a schematic diagram of anexemplary cochlear implant 1000 configured to implement aspects of thetechniques presented herein, while FIG. 10B is a block diagram of thecochlear implant 1000. For ease of illustration, FIGS. 10A and 10B willbe described together.

The cochlear implant 1000 comprises an external component 1002 and aninternal/implantable component 1004. The external component 1002includes a sound processing unit 1012 that is directly or indirectlyattached to the body of the recipient, an external coil 1006 and,generally, a magnet (not shown in FIG. 10A) fixed relative to theexternal coil 1006.

The sound processing unit 1012 comprises a local microphone array (LMA)1013, comprised of microphones 1008(1) and 1008(2), configured toreceive sound input signals. In this example, the sound processing unit1012 may also include one or more auxiliary input devices 1009, such asone or more telecoils, audio ports, data ports, cable ports, etc., and awireless transmitter/receiver (transceiver) 1011.

The sound processing unit 1012 also includes, for example, at least onebattery 1007, a radio-frequency (RF) transceiver 1021, and a processingblock 1050. The processing block 1050 comprises a number of elements,including an integrated noise reduction module 1025 and a soundprocessor 1033. The processing block 1050 may also include otherelements that, have for ease of illustration, been omitted from FIG.10B. Each of the integrated noise reduction module 1025 and a soundprocessor 1033 may be formed by one or more processors (e.g., one ormore Digital Signal Processors (DSPs), one or more uC cores, etc.),firmware, software, etc. arranged to perform operations describedherein. That is, the integrated noise reduction module 1025 and a soundprocessor 1033 may each be implemented as firmware elements, partiallyor fully implemented with digital logic gates in one or moreapplication-specific integrated circuits (ASICs), partially or fullyimplemented in software, etc.

The integrated noise reduction module 1025 is configured to perform theintegrated noise reduction techniques described elsewhere herein. Forexample, the integrated noise reduction module 1025 corresponds to theintegrated MVDR_(a) beamformer 125 and the MVDR_(a,e) beamformer 525,described above. As such, in different embodiments, the integrated noisereduction module 1025 may include the processing blocks described abovewith reference to FIGS. 4 and 7, as well as other combinations ofprocessing blocks configured to perform the integrated noise reductiontechniques described elsewhere herein.

As noted above, the integrated noise reduction techniques, and thus theintegrated noise reduction module 1025, generates an integrated speechestimate from sound signals received via at least the LMA 1013. Shown inFIG. 10 is at least one optional external microphone (XM) which may alsobe in communication with the sound processing unit 1012. If present, theXM 1017 is configured to capture sound signals and provide XM signals tothe sound processing unit 1012. These XM signals may also be used togenerate the integrated speech estimate. The sound processor 1033 isconfigured to use the integrated speech estimate (generated from one orboth of the LMA signals and the XM signals) to generate stimulationsignals for delivery to the recipient.

Returning to the example embodiment of FIGS. 10A and 10B, theimplantable component 1004 comprises an implant body (main module) 1014,a lead region 1016, and an intra-cochlear stimulating assembly 1018, allconfigured to be implanted under the skin/tissue (tissue) 1005 of therecipient. The implant body 1014 generally comprises ahermetically-sealed housing 1015 in which RF interface circuitry 1024and a stimulator unit 1020 are disposed. The implant body 1014 alsoincludes an internal/implantable coil 1022 that is generally external tothe housing 1015, but which is connected to the RF interface circuitry1024 via a hermetic feedthrough (not shown in FIG. 10B).

As noted, stimulating assembly 1018 is configured to be at leastpartially implanted in the recipient's cochlea 1037. Stimulatingassembly 1018 includes a plurality of longitudinally spacedintra-cochlear electrical stimulating contacts (electrodes) 1026 thatcollectively form a contact or electrode array 1028 for delivery ofelectrical stimulation (current) to the recipient's cochlea. Stimulatingassembly 1018 extends through an opening in the recipient's cochlea(e.g., cochleostomy, the round window, etc.) and has a proximal endconnected to stimulator unit 1020 via lead region 1016 and a hermeticfeedthrough (not shown in FIG. 10B). Lead region 1016 includes aplurality of conductors (wires) that electrically couple the electrodes1026 to the stimulator unit 1020.

As noted, the cochlear implant 1000 includes the external coil 1006 andthe implantable coil 1022. The coils 1006 and 1022 are typically wireantenna coils each comprised of multiple turns of electrically insulatedsingle-strand or multi-strand platinum or gold wire. Generally, a magnetis fixed relative to each of the external coil 1006 and the implantablecoil 1022. The magnets fixed relative to the external coil 1006 and theimplantable coil 1022 facilitate the operational alignment of theexternal coil with the implantable coil. This operational alignment ofthe coils 1006 and 1022 enables the external component 1002 to transmitdata, as well as possibly power, to the implantable component 1004 via aclosely-coupled wireless link formed between the external coil 1006 withthe implantable coil 1022. In certain examples, the closely-coupledwireless link is a radio frequency (RF) link. However, various othertypes of energy transfer, such as infrared (IR), electromagnetic,capacitive and inductive transfer, may be used to transfer the powerand/or data from an external component to an implantable component and,as such, FIG. 10B illustrates only one example arrangement.

As noted above, the integrated noise reduction module 1025 is configuredto generate an integrated speech estimate, and the sound processor 1033is configured to use the integrated speech estimate to generatestimulation signals for delivery to the recipient. More specifically,the sound processor 1033 (e.g., one or more processing elementsimplementing firmware, software, etc.) is configured to use theintegrated speech estimate to generate stimulation control signals 1036that represent electrical stimulation for delivery to the recipient. Inthe embodiment of FIG. 10B, the stimulation control signals 1036 areprovided to the RF transceiver 1021, which transcutaneously transfersthe stimulation control signals 1036 (e.g., in an encoded manner) to theimplantable component 1004 via external coil 1006 and implantable coil1022. That is, the stimulation control signals 1036 are received at theRF interface circuitry 1024 via implantable coil 1022 and provided tothe stimulator unit 1020. The stimulator unit 1020 is configured toutilize the stimulation control signals 1036 to generate electricalstimulation signals (e.g., current signals) for delivery to therecipient's cochlea via one or more stimulating contacts 1026. In thisway, cochlear implant 1000 electrically stimulates the recipient'sauditory nerve cells, bypassing absent or defective hair cells thatnormally transduce acoustic vibrations into neural activity, in a mannerthat causes the recipient to perceive one or more components of theinput audio signals.

FIGS. 10A and 10B illustrate an arrangement in which the cochlearimplant 1000 includes an external component. However, it is to beappreciated that embodiments of the present invention may be implementedin cochlear implants having alternative arrangements. For example, thetechniques presented herein could also be implemented in a totallyimplantable or mostly implantable auditory prosthesis where componentsshown in sound processing unit 1012, such as processing block 1050,could instead be implanted in the recipient.

FIG. 11 is a functional block diagram of one example arrangement for abone conduction device 1100 in accordance with embodiments presentedherein. Bone conduction device 1100 is configured to be positioned at(e.g., behind) a recipient's ear. The bone conduction device 1100comprises a microphone array 1113, an electronics module 1170, atransducer 1171, a user interface 1172, and a power source 1173.

The local microphone array (LMA) 1113 comprises microphones 1108(1) and1108(2) that are configured to convert received sound signals 1116 intoLMA signals. Although not shown in FIG. 11, bone conduction device 1100may also comprise other sound inputs, such as ports, telecoils, etc.

The LMA signals are provided to electronics module 1170 for furtherprocessing. In general, electronics module 1170 is configured to convertthe LMA signals into one or more transducer drive signals 1180 thatactive transducer 1171. More specifically, electronics module 1170includes, among other elements, a processing block 1150 and transducerdrive components 1176.

The processing block 1174 comprises a number of elements, including anintegrated noise reduction module 1125 and sound processor 1133. Each ofthe integrated noise reduction module 1125 and the sound processor 1133may be formed by one or more processors (e.g., one or more DigitalSignal Processors (DSPs), one or more uC cores, etc.), firmware,software, etc. arranged to perform operations described herein. That is,the integrated noise reduction module 1125 and the sound processor 1133may each be implemented as firmware elements, partially or fullyimplemented with digital logic gates in one or more application-specificintegrated circuits (ASICs), partially or fully in software, etc.

The integrated noise reduction module 1125 is configured to perform theintegrated noise reduction techniques described elsewhere herein. Forexample, the integrated noise reduction module 1125 corresponds to theintegrated MVDR_(a) beamformer 125 and the MVDR_(a,e) beamformer 525,described above. As such, in different embodiments, the integrated noisereduction module 1125 may include the processing blocks described abovewith reference to FIGS. 4 and 7, as well as other combinations ofprocessing blocks configured to perform the integrated noise reductiontechniques described elsewhere herein. Although not shown in FIG. 11 isat least one optional external microphone (XM) may be in communicationwith the bone conduction device 1100. If present, the XM is configuredto capture sound signals and provide XM signals to the conduction device1100 for processing by the integrated noise reduction module 1125 (i.e.,the XM signals may also be used to generate the integrated speechestimate).

The sound processor 1133 is configured to process the integrated speechestimate (generated from one or both of the LMA signals and the XMsignals) for use by the transducer drive components 1176. The transducerdrive components 1176 generate transducer drive signal(s) 1180 which areprovided to the transducer 1171. The transducer 1171 illustrates anexample of a stimulation unit that receives the transducer drivesignal(s) 1180 and generates vibrations for delivery to the skull of therecipient via a transcutaneous or percutaneous anchor system (not shown)that is coupled to bone conduction device 1100. Delivery of thevibration causes motion of the cochlea fluid in the recipient'scontralateral functional ear, thereby activating the hair cells in thefunctional ear.

FIG. 11 also illustrates the power source 1173 that provides electricalpower to one or more components of bone conduction device 1300. Powersource 1173 may comprise, for example, one or more batteries. For easeof illustration, power source 1173 has been shown connected only to userinterface 1172 and electronics module 1170. However, it should beappreciated that power source 1173 may be used to supply power to anyelectrically powered circuits/components of bone conduction device 1100.

User interface 1172 allows the recipient to interact with boneconduction device 1100. For example, user interface 1172 may allow therecipient to adjust the volume, alter the speech processing strategies,power on/off the device, etc. Although not shown in FIG. 11, boneconduction device 1100 may further include an external interface thatmay be used to connect electronics module 1170 to an external device,such as a fitting system.

FIG. 12 is a block diagram of an arrangement of a mobile computingdevice 1200, such as a smartphone, configured to be implemented theintegrated noise reduction techniques presented herein. It is to beappreciated that FIG. 12 is merely illustrative.

Mobile computing device 1200 first comprises an antenna 1236 and atelecommunications interface 1238 that are configured for communicationon a telecommunications network. The telecommunications network overwhich the radio antenna 1236 and the radio interface 1238 communicatemay be, for example, a Global System for Mobile Communications (GSM)network, code division multiple access (CDMA) network, time divisionmultiple access (TDMA), or other kinds of networks.

The mobile computing device 1200 also includes a wireless local areanetwork interface 1240 and a short-range wireless interface/transceiver1242 (e.g., an infrared (IR) or Bluetooth® transceiver). Bluetooth® is aregistered trademark owned by the Bluetooth® SIG. The wireless localarea network interface 1240 allows the mobile computing device 1200 toconnect to the Internet, while the short-range wireless transceiver 1242enables the external device 1206 to wirelessly communicate (i.e.,directly receive and transmit data to/from another device via a wirelessconnection), such as over a 2.4 Gigahertz (GHz) link. It is to beappreciated that that any other interfaces now known or later developedincluding, but not limited to, Institute of Electrical and ElectronicsEngineers (IEEE) 802.11, IEEE 802.16 (WiMAX), fixed line, Long TermEvolution (LTE), etc., may also or alternatively form part of the mobilecomputing device 1200.

In the example of FIG. 12, mobile computing device 1200 also comprisesan audio port 1244, a local microphone array (LMA) 1213, a speaker 1248,a display screen 1258, a subscriber identity module or subscriberidentification module (SIM) card 1252, a battery 1254, a user interface1256, one or more processors 1250, and a memory 1260. The LMA 1213includes microphones 1208(1) and 1208(2). Stored in memory 1260 isintegrated noise reduction logic 1225 and sound processing logic 1233.

The display screen 1258 is an output device, such as a liquid crystaldisplay (LCD), for presentation of visual information to the cochlearimplant recipient. The user interface 1256 may take many different formsand may include, for example, a keypad, keyboard, mouse, touchscreen,display screen, etc. Memory 1260 may comprise any one or more of readonly memory (ROM), random access memory (RAM), magnetic disk storagemedia devices, optical storage media devices, flash memory devices,electrical, optical, or other physical/tangible memory storage devices.The one or more processors 1258 are, for example, microprocessors ormicrocontrollers that execute instructions for the integrated noisereduction logic 1225 and sound processing logic 1233.

When executed by the one or more processors 1250, the integrated noisereduction logic 1225 is configured to perform the integrated noisereduction techniques described elsewhere herein. For example, theintegrated noise reduction logic 1225 corresponds to the integratedMVDR_(a) beamformer 125 and the MVDR_(a,e) beamformer 525, describedabove. As such, in different embodiments, the integrated noise logic1225 may include software forming the processing blocks described abovewith reference to FIGS. 4 and 7, as well as other combinations ofprocessing blocks configured to perform the integrated noise reductiontechniques described elsewhere herein to generate an integrated noiseestimate. When executed by the one or more processors 1250, the soundprocessing logic 1233 is configured to perform sound processingoperations using the integrated noise estimate.

FIG. 13 is a flowchart of a method 1390 performed/executed by a devicecomprising at least a local microphone array (LMA), in accordance withembodiments presented herein. Method 1390 begins at 1392 where soundsignals are received with at least the local microphone array of thedevice. The received sound signals comprise/include at least one targetsound.

At 1394, an a priori estimate of the at least one target sound in thereceived sound signals is generated, wherein the a priori estimate isbased at least on a predetermined location of a source of the at leastone target sound. At 1396, a direct estimate of the at least one targetsound in the received sound signals is generated, wherein the directestimate is based at least on a real-time estimate of a location of asource of the at least one target sound. At 1398, a weighted combinationof the a priori estimate and the direct estimate is generated, where theweighted combination is an integrated estimate of the target sound.Subsequent sound processing operations may be performed in the deviceusing the integrated estimate of the target sound.

In certain embodiments, the a priori estimate of the at least one targetsound is generated using only an a priori relative transfer function(RTF) vector generated from the received sound signals. In certainembodiments, the direct estimate of the at least one target sound isgenerated using only an estimated relative transfer function (RTF)vector for the received sound signals.

In certain embodiments, the weighted combination of the a prioriestimate and the direct estimate is generated by weighting the a prioriestimate in accordance with a first cost function controlled by a firstset of tuning parameters to generate a weighted a priori estimate; andweighting the direct estimate in accordance with a second cost functioncontrolled by a second set of tuning parameters to generate a weighteddirect estimate. The weighted direct estimate with the weighted a prioriestimate are then mixed with one another. The first set of tuningparameters may be set based on one or more confidence measuresassociated with the a priori estimate of the of the at least one targetsound, wherein the one or more confidence measures represent an estimateof a reliability of the a priori estimate. The second set of tuningparameters may be set based on one or more confidence measuresassociated with the direct estimate of the of the at least one targetsound, wherein the one or more confidence measures represent an estimateof a reliability of the direct estimate.

As detailed above, presented herein are integrated noise reductiontechniques, sometimes referred to as an integrated beamformer (e.g., anintegrated MVDR_(a) beamformer or an integrated MVDR_(a,e) beamformer).In general, the integrated noise reduction techniques combine the use ofan apriori (i.e., predetermined, assumed, or pre-defined) location of atarget sound source with a real-time estimated location of the soundsource.

It is to be appreciated that the above described embodiments are notmutually exclusive and that the various embodiments can be combined invarious manners and arrangements.

The invention described and claimed herein is not to be limited in scopeby the specific preferred embodiments herein disclosed, since theseembodiments are intended as illustrations, and not limitations, ofseveral aspects of the invention. Any equivalent embodiments areintended to be within the scope of this invention. Indeed, variousmodifications of the invention in addition to those shown and describedherein will become apparent to those skilled in the art from theforegoing description. Such modifications are also intended to fallwithin the scope of the appended claims.

Appendix I. Appendix A—MVDR_(a) with a Priori Assumed RTF Vector

A pre-whitened-transformedversion of the a priori assumed RTF vector canbe considered where:

$\begin{matrix}{{\underset{\_}{\overset{\sim}{h}}}_{a} = {{L_{a}^{- 1}T_{a}^{H}{\overset{\sim}{h}}_{a}} = \begin{bmatrix}0 \\\vdots \\0 \\\frac{{\overset{˜}{h}}_{a}}{I_{M_{a}}}\end{bmatrix}}} & (64)\end{matrix}$

where l_(M) _(a) is the bottom-right element in L_(a). Using thedefinition from (16), i.e., a R_(n) _(a) _(n) _(a) ⁻¹=(T_(a)L_(a)L_(n)^(H)T_(n) ^(H))⁻¹=T_(a)L_(a) ^(−H)L_(a) ⁻¹T_(a) ^(H), the MVDR_(a)filter of (25) can then be re-written as:

$\begin{matrix}{{{{\overset{\hat{}}{w}}_{a} = {T_{a}L_{a}^{- H}}},{\underset{\_}{\overset{\sim}{w}}}_{a}}{where}} & (65) \\{{\underset{¯}{\overset{\sim}{w}}}_{a} = {\frac{{\underset{¯}{\overset{\sim}{h}}}_{a}}{{{\underset{¯}{\overset{\sim}{h}}}_{a}}^{H}{\underset{¯}{\overset{\sim}{h}}}_{a}} = {\begin{bmatrix}0 \\\vdots \\0 \\{\underset{¯}{\overset{\sim}{w}}}_{a,M_{a}}\end{bmatrix}\begin{bmatrix}0 \\\vdots \\0 \\\frac{l_{M_{a}}}{{\overset{˜}{h}}_{a}}\end{bmatrix}}}} & (66)\end{matrix}$

Substitution of (65) into (26) yields the speech estimate as:

$\begin{matrix}\begin{matrix}{{\overset{\sim}{z}}_{a,1} = {{{\underset{\_}{\overset{\sim}{w}}}_{a}}^{H}\underset{\underset{{\underset{\_}{y}}_{a}}{︸}}{L_{a}^{- 1}T_{a}^{H}y_{a}}}} \\{= {\frac{l_{M_{a}}}{{\overset{˜}{h}}_{a}}{\underset{\_}{y}}_{a,M_{a}}}}\end{matrix} & (67)\end{matrix}$

II. Appendix B—MVDR_(a) with Estimated RTF Vector

As opposed to using the raw signal correlation matrices, the estimationproblem of (28) can be equivalently formulated first in the transformeddomain since the Frobenius norm is invariant under a unitarytransformation, therefore:

$\begin{matrix}{\min\limits_{{\hat{R}}_{{xa},{r\; 1}}}{{T_{a}^{H}\left( {\left( {R_{y_{a}y_{a}} - R_{n_{a}n_{a}}} \right) - {{\overset{\hat{}}{R}}_{{xa},{r\; 1}}T}} \right.}_{F}^{2}}} & (68)\end{matrix}$

Furthermore, it is argued in that spatial pre-whitening should also beincluded in the optimisation problem. Consequently, the estimationproblem can be re-framed in the pre-whitened-transformed domain asfollows:

$\begin{matrix}{\min\limits_{{\hat{R}}_{{xa},{r\; 1}}}{{\left( {{\underset{¯}{R}}_{y_{a}y_{a}} - {\underset{¯}{R}}_{n_{a}n_{a}}} \right) - {L_{a}^{- 1}T_{a}^{H}{\overset{\hat{}}{R}}_{{xa},{r\; 1}}T_{a}L_{a}^{- H}}}}_{F}^{2}} & (69)\end{matrix}$

where R _(y) _(a) _(y) _(a) =L_(a) ⁻¹T_(a) ^(H)R_(y) _(a) _(y) _(a)T_(a)L_(a) ^(−H), and R _(n) _(a) _(n) _(a) =L_(a) ⁻¹T_(a) ^(H)R_(n)_(a) _(n) _(a) T_(a)L_(a) ^(−H)=I_(M) _(a) . The solution then followsfrom the GEVD on the matrix pencil {R _(y) _(n) _(y) _(n) , R _(n) _(a)_(n) _(a) }, and hence reduces to an EVD of R _(y) _(n) _(y) _(a) :

R _(y) _(n) _(y) _(n) =PλP^(H)  (70)

where P is a unitary matrix of eigenvectors and λ is a diagonal matrixwith the associated eigenvalues in descending order. The estimated RTFvector is then defined using the principal (first in this case)eigenvector, p_(max):

$\begin{matrix}{{\hat{h}}_{a} = \frac{T_{a}L_{a}p_{\max}}{\eta_{p}}} & (71)\end{matrix}$

where the scaling η_(ρ)=e_(a1) ^(T)T_(a)L_(a)p_(max) and the M×1 vectore_(a1)=[1 0 . . . 0]^(T).

This estimated RTF vector can now be used as an alternative to {tildeover (h)}_(a) for the MVDR_(a) defined in (25), and is given by:

$\begin{matrix}{{\hat{w}}_{a} = \frac{R_{n_{a}n_{a}}^{- 1}{\hat{h}}_{a}}{{\hat{h}}_{a}^{H}R_{n_{a}n_{a}}^{- 1}{\hat{h}}_{a}}} & (72)\end{matrix}$

This filter based on estimated quantities cart also be reformulated intine pre whitened-transformed domain. Starting with the definition ofthe pre-whitened-transformed version of ĥ_(a):

$\begin{matrix}{{\hat{\underset{\_}{h}}}_{a} = {{L_{a}^{- 1}T_{a}^{H}{\hat{h}}_{a}} = \frac{p_{\max}}{\eta_{\mathcal{p}}}}} & (73)\end{matrix}$

Hence (72) becomes:

ŵ _(a) =T _(a) L _(a) ^(−H) ŵ _(a)  (74)

where

$\begin{matrix}{{\underset{\_}{\hat{w}}}_{a} = {\frac{{\underset{\_}{\hat{h}}}_{a}}{{\underset{\_}{\hat{h}}}_{a}^{H}{\underset{\_}{\hat{h}}}_{a}} = {\eta_{p}^{*}p_{\max}}}} & (75)\end{matrix}$

Substitution of (74) into (32) yields the speech estimate as:

$\begin{matrix}\begin{matrix}{{\hat{z}}_{a,1} = {{\underset{\_}{\hat{w}}}_{a}^{H}\underset{\underset{\underset{\_}{y_{a}}}{︸}}{L_{a}^{- 1}T_{a}^{H}y_{a}}}} \\{= {\eta_{p}p_{\max}^{H}{\underset{\_}{y}}_{a}}}\end{matrix} & (76)\end{matrix}$

III. Appendix C—MVDR_(a) with Partial a Priori Assumed RTF Vector andPartial Estimated RTF Vector

Following the procedure as in (68), the transformation is firstlyapplied, also including the penalty term:

$\begin{matrix}{{\min\limits_{{\hat{\Phi}}_{x,{r\; 1},}{\hat{h}}_{e}}{\left. {{T^{H}\left( {R_{yy} - R_{{nn}_{\lambda}}} \right)} - {{{\hat{\Phi}}_{x,{r\; 1}}\begin{bmatrix}{\overset{\sim}{h}}_{a} \\{\hat{h}}_{e}\end{bmatrix}}\left\lbrack {{\overset{\sim}{h}}_{a}^{H}{\hat{h}}_{e}^{H}} \right\rbrack}} \right)T}}}_{F}^{2} & (77)\end{matrix}$

after the pre-whitening operation can also be included in theoptimisation probie

$\begin{matrix}{\min\limits_{{\hat{\Phi}}_{x,{r\; 1},}{\hat{h}}_{e}}{{\left( {{\underset{\_}{R}}_{yy} - {\underset{\_}{R}}_{nn}} \right) - {L^{- 1}{T^{H}\left( {{{\hat{\Phi}}_{x,{r\; 1}}\begin{bmatrix}{\overset{\sim}{h}}_{a} \\{\hat{h}}_{e}\end{bmatrix}}\left\lbrack {{\overset{\sim}{h}}_{a}^{H}{\hat{h}}_{e}^{H}} \right\rbrack} \right)}{TL}^{- H}}}}_{F}^{2}} & (78)\end{matrix}$

where R _(yy)=L⁻¹T^(H)R_(yy)TL^(−H) and R _(nn)=L⁻¹T^(H)R_(nn) _(λ)TL^(−H)=I_((M) _(a) _(+M) _(e) ). Expansion of (78) then results in:

$\begin{matrix}{\min\limits_{{\hat{\Phi}}_{x,{r\; 1},{\hat{h}}_{e}}}{{\left\lbrack \frac{{\underset{\_}{K}}_{A}}{{\underset{\_}{K}}_{C}} \middle| \frac{{\underset{\_}{K}}_{B}}{{\underset{\_}{K}}_{x +}} \right\rbrack - \left\lbrack \frac{0}{0} \middle| \frac{0}{{\underset{\_}{K}}_{x,{r\; 1}}} \right\rbrack}}_{F}^{2}} & (79)\end{matrix}$

where the block dimensions are such that K _(A) is (M_(a)−1)×(M_(a)−1)matrix, K _(B) an (M_(a)−1)×(M_(e)+1) matrix, K _(C) a(M_(e)+1)×(M_(a)−1) matrix and K _(x,r1) and K _(x) are(M_(e)+1)×(M_(e)+1) matrices realised as:

$\begin{matrix}{{\underset{\_}{K}}_{x,{r\; 1}} = {J^{T}{\overset{\sim}{\underset{\_}{R}}}_{x,{r\; 1}}J}} & (80) \\{{\underset{\_}{K}}_{x +} = {{J^{T}{\underset{\_}{R}}_{yy}J} - \underset{\underset{I_{({M_{e} + 1})}}{︸}}{J^{T}{\underset{\_}{R}}_{nn}J}}} & (81)\end{matrix}$

where {tilde over (R)}_(x,r1)=L⁻¹T^(H){tilde over (R)}_(x,r1)TL^(−H) andJ=[0_((M) _(a) _(+1)×(M) _(a) ⁻¹⁾|I_((M) _(a) ₊₁)]^(T) is a selectionmatrix. It is then evident that K _(x+), can essentially be constructedfrom the last (M_(e)+1) elements of the pre-whitened-transformedsignals, namely that in relation to the last element of the LMA−v _(a,M)_(a) ), and those in relation to the XM signals −v _(e). Hence the firstterm of K _(x+) is equivalently:

$\begin{matrix}{{J^{T}{\underset{\_}{R}}_{yy}J} = {{\mathbb{E}}\left\{ {\begin{bmatrix}{{\underset{\_}{y}}_{a},_{M_{a}}} \\{\underset{\_}{y}}_{e}\end{bmatrix}\begin{bmatrix}{\underset{\_}{y}}_{a,M_{a}}^{H} & {\underset{\_}{y}}_{e}^{H}\end{bmatrix}} \right\}}} & (82)\end{matrix}$

and similarly for the second term of K _(x+). It follows that (79) thenreduces to the following (M_(e)+1)×(M_(e)+1) matrix approximationproblem:

$\begin{matrix}{\min\limits_{{\hat{\Phi}}_{x,{r\; 1},{\hat{h}}_{e}}}{{{\underset{\_}{K}}_{x +} - {\underset{\_}{K}}_{x,{r\; 1}}}}_{F}^{2}} & (83)\end{matrix}$

The solution then follows from the GEVD on the matrix pencil {J^(T) R_(yy), J, J^(T) R _(nn) J} and hence reduces to an EVD of J^(T) R _(yy)J .

J^(T) RR _(yy) J=VΓV^(H)  (84)

where V is a M_(e)+1)×(M_(e)+1) unitary matrix of eigenvectors and Γ isa diagonal matrix with the associated eigenvalues in descending, order.The estimated RTF vector for the XM signals is then defined from thecorresponding principal (first in this case) eigcivcctor, v_(max):

$\begin{matrix}{{\hat{h}}_{e} = {\frac{{\overset{\sim}{h}}_{a}}{l_{M_{a}}v_{1}}J_{e}^{T}{TLJv}_{\max}}} & (85)\end{matrix}$

where the selection matrix, J_(e)=[0_((M) _(e) _(×M) _(a) ₎|I_(M) _(e)]^(T).

Finally, this estimate is then used to compute the correspondingMVDR_(a,e) filter with an a priori assumed RTF vector and a partiallyestimated RTF vector, along with the penalty term as:

$\begin{matrix}{\overset{\sim}{w} = \frac{R_{nn}^{- 1}\overset{\sim}{h}}{{\overset{\sim}{h}}^{H}R_{nn}^{- 1}\overset{\sim}{h}}} & (86)\end{matrix}$

where {tilde over (h)} as defined in (44) can be equivalentlyrepresented as:

$\begin{matrix}{\overset{\sim}{h} = {\frac{{\overset{\sim}{h}}_{a}}{l_{M_{a}}v_{1}}J_{e}^{T}{TLJv}_{\max}}} & (87)\end{matrix}$

This filter can also be realised in the pre-whitened-transformed domain.The pre-whitened-transformed version of {tilde over (h)} can firstly beconsidered where:

$\begin{matrix}{\overset{\sim}{\underset{\_}{h}} = {{L^{- 1}T^{H}\overset{\sim}{h}} = {{\frac{{\overset{\sim}{h}}_{a}}{l_{M_{a}}v_{1}}{Jv}_{\max}} = {\frac{{\overset{\sim}{h}}_{a}}{l_{M_{a}}v_{1}}\begin{bmatrix}0 \\\vdots \\0 \\v_{1} \\v_{e}\end{bmatrix}}}}} & (88)\end{matrix}$

Therefore, (86) can be re-written as:

{tilde over (w)}=TL^(−H){tilde over (w)}  (89)

where:

$\begin{matrix}{\overset{\sim}{\underset{\_}{w}} = {\frac{\overset{\sim}{\underset{\_}{h}}}{{\overset{\sim}{\underset{\_}{h}}}^{H}\overset{\sim}{\underset{\_}{h}}} = {\begin{bmatrix}0 \\\vdots \\0 \\{\overset{\sim}{\underset{\_}{w}}}_{\lambda,v}\end{bmatrix} = {\frac{l_{M_{a}}v_{1}^{*}}{{\overset{\sim}{h}}_{a}}\begin{bmatrix}0 \\\vdots \\0 \\v_{1} \\v_{e}\end{bmatrix}}}}} & (90)\end{matrix}$

Therefore, the corresponding speech estimate will be:

$\begin{matrix}{{\overset{\sim}{z}}_{1} = {{{\overset{\sim}{\underset{\_}{w}}}^{H}\underset{\underset{\underset{\_}{y}}{︸}}{L^{- 1}T^{H}y}} = {\frac{l_{M_{a}}v_{1}}{{\overset{\sim}{h}}_{a}}{v_{\max}^{H}\begin{bmatrix}{{\underset{\_}{y}}_{a},_{M_{a}}} \\{\underset{\_}{y}}_{e}\end{bmatrix}}}}} & (91)\end{matrix}$

IV. Appendix D—MVDR_(a,e) with Estimated RTF Vector

Once again, it will be convenient to re frame the problem in thepre-whitened-transformed domain similarly to (78):

$\begin{matrix}{\min\limits_{{\hat{R}}_{x,{r\; 1}}}{{\left( {R_{yy} - R_{nn}} \right) - {L^{- 1}{T^{H}\left( {{{\hat{\Phi}}_{x,{r\; 1}}\begin{bmatrix}{\hat{q}}_{a} \\{\hat{q}}_{e}\end{bmatrix}}\begin{bmatrix}{\hat{q}}_{a}^{H} & {\hat{q}}_{e}^{H}\end{bmatrix}} \right)}{TL}^{- H}}}}_{F}^{2}} & (92)\end{matrix}$

In this case however, the problem cannot be reduced to a lower order asthe entire RTF vector is being estimated. Hence the solution followsfrom an EVD on R _(yy):

R _(yy)=QΣQ^(H)  (93)

(94)where Q is a (M_(a)+M_(e))×(M_(a)+M_(e)) unitary matrix of eigenvectorsand E is a diagonal matrix with the associated eigenvalues in descendingorder. The estimated RTF vector is then given by the principal (first inthis case) eigeiwector, q_(max):

$\begin{matrix}{\hat{h} = {\begin{bmatrix}{\hat{q}}_{a} \\{\hat{q}}_{e}\end{bmatrix} = \frac{{TLq}_{\max}}{\eta_{q}}}} & (95)\end{matrix}$

where η_(q)=e_(x1) ^(T)TLq_(max) and e_(x1)=[1 0 . . . 0|0 . . . 0]^(T).The estimated RTF vector an therefore be used as an alternative to{tilde over (h)} for the MVDR_(a,e):

$\begin{matrix}{\hat{w} = \frac{R_{nn}^{- 1}\hat{h}}{{\hat{h}}^{H}R_{nn}^{- 1}\hat{h}}} & (96)\end{matrix}$

This filter based on estimated quantities can also be reformulated inthe pre-whitened-transformed domain. Starting with the definition forthe pre-whitened-transformed version of this estimated RTF:

$\begin{matrix}{\underset{\_}{\hat{h}} = {{L^{- 1}T^{H}\hat{h}} = \frac{q_{\max}}{\eta_{q}}}} & (97)\end{matrix}$

Hence (96) becomes:

ŵ=TL^(−H){circumflex over (w)}  (98)

where

$\begin{matrix}{\overset{\sim}{\underset{\_}{w}} = {\frac{\overset{\sim}{\underset{\_}{h}}}{{\overset{\sim}{\underset{\_}{h}}}^{H}\overset{\sim}{\underset{\_}{h}}} = {\eta_{q}^{*}q_{\max}}}} & (99)\end{matrix}$

The co-responding speech estimat sing the estimated RTF vector istherefore:

$\begin{matrix}{{\overset{\sim}{z}}_{1} = {{{\overset{\sim}{\underset{\_}{w}}}^{H}\underset{\underset{\underset{\_}{y}}{︸}}{L^{- 1}T^{H}y}} = {\eta_{q}q_{\max}^{H}\underset{\_}{y}}}} & (100)\end{matrix}$

What is claimed is:
 1. A method, comprising: receiving sound signalswith at least a local microphone array of a device, wherein the soundsignals comprise at least one target sound; generating an a prioriestimate of the at least one target sound in the received sound signals,wherein the a priori estimate is based at least on a predeterminedlocation of a source of the at least one target sound; generating adirect estimate of the at least one target sound in the received soundsignals, wherein the direct estimate is based at least on a real-timeestimate of a location of a source of the at least one target sound; andgenerating a weighted combination of the a priori estimate and thedirect estimate, wherein the weighted combination is an integratedestimate of the target sound.
 2. The method of claim 1, whereingenerating the a priori estimate of the at least one target sound in thereceived sound signal, comprises: generating the a priori estimate usingonly an a priori relative transfer function (RTF) vector generated fromthe received sound signals.
 3. The method of claim 1, wherein generatingthe direct estimate of the at least one target sound in the receivedsound signals, comprises: generating the direct estimate using only anestimated relative transfer function (RTF) vector for the received soundsignals.
 4. The method of claim 1, wherein generating the weightedcombination of the a priori estimate of the at least one target soundand the direct estimate of the at least one target sound, comprises:weighting the a priori estimate in accordance with a first cost functioncontrolled by a first set of tuning parameters to generate a weighted apriori estimate; weighting the direct estimate in accordance with asecond cost function controlled by a second set of tuning parameters togenerate a weighted direct estimate; and mixing the weighted directestimate with the weighted a priori estimate.
 5. The method of claim 4,further comprising: setting the first set of tuning parameters based onone or more confidence measures associated with the a priori estimate ofthe of the at least one target sound, wherein the one or more confidencemeasures represent an estimate of a reliability of the a prioriestimate.
 6. The method of claim 4, further comprising: setting thesecond set of tuning parameters based on one or more confidence measuresassociated with the direct estimate of the of the at least one targetsound, wherein the one or more confidence measures represent an estimateof a reliability of the direct estimate.
 7. The method of claim 1,wherein generating the a priori estimate of the at least one targetsound in the received sound signal, comprises: generating the a prioriestimate based at least on the predetermined location of a source of theat least one target sound, one or more assumptions regardingcharacteristics of the local microphone array, and one or moreassumptions regarding reverberant characteristics of the at least onetarget sound.
 8. The method of claim 1, wherein generating the directestimate of the at least one target sound in the received sound signals,comprises: generating the direct estimate based at least on a real-timeestimate of a location of a source of the at least one target sound,estimated characteristics of the local microphone array, and estimatedreverberant characteristics of the at least one target sound.
 9. Themethod of claim 1, further comprising: performing subsequent soundprocessing operations in the device using the integrated estimate of thetarget sound.
 10. The method of claim 1, wherein receiving the soundsignals with at least a local microphone array of a device, comprises:receiving a first portion of the sound signals with the local microphonearray of the device; and receiving a second portion of the sound signalswith at least one external microphone.
 11. The method of claim 10,wherein generating the a priori estimate of the at least one targetsound in the received sound signals, comprises: generating the a prioriestimate using both the first portion of the sound signals and thesecond portion of the sound signals in accordance with at least thepredetermined location of the source of the at least one target sound.12. The method of claim 10, wherein generating the direct estimate ofthe at least one target sound in the received sound signals, comprises:generating the direct estimate using both the first portion of the soundsignals and the second portion of the sound signals in accordance withat least the real-time estimate of the location of the source of the atleast one target sound.
 13. A device, comprising: a local microphonearray configured to receive sound signals, wherein the sound signalscomprise at least one target sound; and one or more processorsconfigured to: generate an a priori estimate of the at least one targetsound in the received sound signals using only an a priori relativetransfer function (RTF) vector generated from the received soundsignals, generate a direct estimate of the at least one target sound inthe received sound signals using only an a priori relative transferfunction (RTF) vector generated from the received sound signals, andgenerate a weighted combination of the a priori estimate and the directestimate, wherein the weighted combination is an integrated estimate ofthe target sound.
 14. The device of claim 13, wherein to generate theweighted combination of the a priori estimate of the at least one targetsound and the direct estimate of the at least one target sound, the oneor more processors are configured to: weight the a priori estimate inaccordance with a first cost function controlled by a first set oftuning parameters to generate a weighted a priori estimate; weight thedirect estimate in accordance with a second cost function controlled bya second set of tuning parameters to generate a weighted directestimate; and mix the weighted direct estimate with the weighted apriori estimate.
 15. The device of claim 14, wherein the one or moreprocessors are configured to: set the first set of tuning parametersbased on one or more confidence measures associated with the a prioriestimate of the of the at least one target sound, wherein the one ormore confidence measures represent an estimate of a reliability of the apriori estimate.
 16. The device of claim 14, wherein the one or moreprocessors are configured to: set the second set of tuning parametersbased on one or more confidence measures associated with the directestimate of the of the at least one target sound, wherein the one ormore confidence measures represent an estimate of a reliability of thedirect estimate.
 17. The device of claim 13, wherein to generate the apriori estimate of the at least one target sound in the received soundsignal, the one or more processors are configured to: generate the apriori estimate based at least on a predetermined location of a sourceof the at least one target sound, one or more assumptions regardingcharacteristics of the local microphone array, and one or moreassumptions regarding reverberant characteristics of the at least onetarget sound.
 18. The device of claim 13, wherein to generate the directestimate of the at least one target sound in the received sound signals,the one or more processors are configured to: generate the directestimate based at least on a real-time estimate of a location of asource of the at least one target sound, estimated characteristics ofthe local microphone array, and estimated reverberant characteristics ofthe at least one target sound.
 19. The device of claim 13, wherein theone or more processors are configured to: perform subsequent soundprocessing operations in the device using the integrated estimate of thetarget sound.
 20. A system including the device of claim 13, wherein thelocal microphone array is configured to receive a first portion of thesound signals, and wherein the system comprises: at least one externalmicrophone configured to receive a second portion of the sound signals.21. (canceled)
 22. (canceled)